Jekyll2019-09-19T02:28:51+00:00https://wizardkingz.github.io/feed.xmlJohnew ZhangA blog about financial economics/investing, ML/AI and mathematics related stuffResearch: Essay on Asset Returns with Intertemporal CAPM2019-05-22T00:00:00+00:002019-05-22T00:00:00+00:00https://wizardkingz.github.io/icapm-evidence<p>This paper replicates the return decomposition methodologies proposed by Campbell and Vuolteenaho (2004) and Campbell, Giglio, Polk, and Turley (2018), which explained the size and value “anomalies” in stock returns. We show that due to the co-movement of the cash-flow and discount risk in the modern period, there is little evidence that the value stocks have significant higher betas than the growth stocks; however, the news about expected future market volatility is able to disentangle the size-and-value puzzle effectively. Empirically, small stocks and value stocks require less hedging premium than the large and growth stocks. Lastly, their methodology shows some ability to price the abnormal returns of popular equity strategies and currency carry trade premiums.</p>
<hr />
<h1 id="introduction">Introduction</h1>
<p>It is well-known that the CAPM fails to explain the stock average return in the modern period (after 1963) using the value-weighted S&P 500 return as a proxy for the market portfolio. According to Sharpe (1964) and Lintner (1965), a single total wealth portfolio can summarize a stock’s return characteristics with its beta. However, Fama and French (1995) documented the size and value anomalies, and indicated that portfolios tilting towards small and value stocks tend to generate the excess return. By solving the Merton (1973)’s intertemporal capital asset pricing model (ICAPM) with the Epstein and Zin (1989) utility framework of representative agents, Campbell and Vuolteenaho (2004) provided an elegant explanation by splitting the beta risk into “bad” and “good” varieties, the beta associated with the news about cash flow and the beta associated with the news about discount rate. They showed some evidence that value stocks and small stocks have considerably higher cash- flow betas than the growth stocks and large stocks. In other words, for a conservative long- term investor under ICAPM framework, small and values stocks are intertemporal hedges and perform well when investment opportunities deteriorate; however, in equilibrium, these assets should have delivered a lower average return, so the rational long-term investor should not tilt their portfolio towards small and value stocks. Furthermore, in a subsequent paper, Campbell et al. (2018) claimed that either the increasing volatility of stock returns or decreasing expected stock returns can send negative shocks to the investment opportunities in the stock market. In addition of the low-frequency movements in the equity volatility, they proposed a three-beta ICAPM with the cash-flow, discount rate, and risk news to explain the cross-sectional asset return puzzle, and provided consistent results as their “Bad Beta, Good Beta” paper.</p>
<p>In this paper, we are interested in examining whether the two-beta ICAPM in Campbell and Vuolteenaho (2004) and three-beta ICAPM in Campbell et al. (2018) can still explain the size and value puzzle in the latest modern data (1963:07-2018:12). Under both vector- autoregressive (VAR) economies, we estimate the parameters using the full sample data (1929:06-2018:12). In the first two sections, we describe both ICAPM models and data used to estimate the parameters of VAR economies. In the third section, we walk through our estimated results of on the 25 ME- and BE/ME-sorted portfolios from Ken French’s website in the modern data, and compare against results in both Campbell’s work. In the fourth section, we conduct an out-of-sample testing on some popular equity and currency trading strategies using the estimated two-beta and three-beta models. In the final section, we conclude our findings.</p>
<h1 id="model">Model</h1>
<p>Following Campbell and Shiller (1988), Campbell and Vuolteenaho (2004) used a log-linear approximate decomposition of returns from Campbell (1993)</p>
<script type="math/tex; mode=display">r_{t+1} - E_tr_{t+1} = (E_{t+1} - E_t)\sum_{j=0}^{\infty}\rho^j \Delta d_{t+1+j} - (E_{t+1} - E_t)\sum_{j=0}^{\infty}\rho^j \Delta r_{t+1+j} = N_{CF, t+1} - N_{DR, t+1}</script>
<p>where <script type="math/tex">r_{t+1}</script> is a log stock return, <script type="math/tex">d_{t+1}</script> is the log dividend paid by the stock, <script type="math/tex">\Delta</script> denotes a one- period change, <script type="math/tex">E_t</script> denotes a rational expectation at time <script type="math/tex">t</script>, and <script type="math/tex">\rho</script> is a discount coefficient. We use <script type="math/tex">\rho = 0.95^{1/12}</script> through the entire paper. <script type="math/tex">N_{CF}</script> denotes news about future cash flow and <script type="math/tex">N_{DR}</script> denotes news about future discount rates. They assume the data are generated by a first-order VAR model</p>
<script type="math/tex; mode=display">z_{t+1} = a + \Gamma z_t + u_{t+1}</script>
<p>where <script type="math/tex">r_{t+1}</script> is the first element of <script type="math/tex">z_{t+1}</script>. Provided the above process, we can obtain the news about cash-flow and discount rates as follows:</p>
<script type="math/tex; mode=display">N_{CF, t+1} = (e_1'+e_1'\lambda)\mu_{t+1}</script>
<script type="math/tex; mode=display">N_{DR, t+1} = e_1'\lambda \mu_{t+1}</script>
<p>where <script type="math/tex">\lambda = \rho \Gamma(I-\rho\Gamma)^{-1}</script> and <script type="math/tex">e_1</script> is a vector where the first element is 1 and everywhere else is 0. As a result, the two-beta ICAPM is</p>
<script type="math/tex; mode=display">E_t[R_{i, t+1} - R_{j, t+1}] = \gamma Cov_t[r_{i, t+1} - r_{j, t+1}, N_{CF, t +1}] + Cov_t[r_{i, t+1} - r_{j, t+1}, -N_{DR, t +1}]</script>
<p>Campbell et al. (2018) argued when having the log stochastic discount factor of the intertemporal CAPM that allows for stochastic volatility, the pricing kernel has three factors</p>
<script type="math/tex; mode=display">m_{t+1} - E_tm_{t+1} = -\gamma N_{CF, t+1} - [-N_{DR, t+1}] + \frac{1}{2} N_{RISK, t+1}</script>
<p>where <script type="math/tex">N_{RISK, t+1}</script> is news about volatility. Following this fact, they assumed the economy is described by a first-order VAR</p>
<script type="math/tex; mode=display">z_{t+1} = a + \Gamma z_t + \sigma_tu_{t+1}</script>
<p>where the first two elements of <script type="math/tex">z_{t+1}</script> are <script type="math/tex">r_{t +1}</script> and <script type="math/tex">\sigma_{t+1}^2</script>. They also assume that <script type="math/tex">u_{t+1}</script> has a constant variance-covariance matrix <script type="math/tex">\Sigma</script>, with element <script type="math/tex">\Sigma_{11}=1</script>, and <script type="math/tex">\sigma_t^2</script> is equal to the conditional variance of market returns. Given this structure, news about discount rates can be written as</p>
<script type="math/tex; mode=display">N_{DR, t+1} = e_1'\lambda\sigma_t\mu_{t+1}</script>
<p>while implied cash-flow news is</p>
<script type="math/tex; mode=display">N_{CF, t+1} = (e_1'+e_1'\lambda)\sigma_t\mu_{t+1}</script>
<p>Their specification also provides the news about risk is proportional to news about market return variance, <script type="math/tex">N_V</script>:</p>
<script type="math/tex; mode=display">N_{RISK, t+1} = \omega\rho e_2'(I-\rho\Gamma)^{-1}\sigma_tu_{t+1} = \omega N_{V, t+1}</script>
<p>Hence, they derived the three-beta ICAPM model:</p>
<script type="math/tex; mode=display">E_t[R_{i, t+1} - R_{j, t+1}] = \gamma Cov_t[r_{i, t+1} - r_{j, t+1}, N_{CF, t +1}] + Cov_t[r_{i, t+1} - r_{j, t+1}, -N_{DR, t +1}]-\frac{1}{2}\omega Cov_t[r_{i, t+1} - r_{j, t+1}, N_{V, t +1}]</script>
<h2 id="beta-measurement">Beta Measurement</h2>
<p>To be consistent with the previous paper in Campbell (1993), we define the betas as follows:</p>
<script type="math/tex; mode=display">\beta_{i, CF} \equiv \frac{Cov(r_{i, t}, N_{CF, t})}{Var(r_{M, t} - E_{t-1} r_{M, t})}</script>
<script type="math/tex; mode=display">\beta_{i, DR} \equiv \frac{Cov(r_{i, t}, -N_{DR, t})}{Var(r_{M, t} - E_{t-1} r_{M, t})}</script>
<script type="math/tex; mode=display">\beta_{i, V} \equiv \frac{Cov(r_{i, t}, N_{V, t})}{Var(r_{M, t} - E_{t-1} r_{M, t})}</script>
<p>and estimate the betas with a lag</p>
<script type="math/tex; mode=display">\widehat{\beta}_{i, CF} = \frac{Cov(r_{i, t}, N_{CF, t})}{Var(N_{CF, t} - N_{DR, t}+ I_{V}N_{V, t})}+\frac{Cov(r_{i, t}, N_{CF, t-1})}{Var(N_{CF, t} - N_{DR, t}+ I_{V}N_{V, t})}</script>
<script type="math/tex; mode=display">\widehat{\beta}_{i, DR} = \frac{Cov(r_{i, t}, -N_{DR, t})}{Var(N_{CF, t} - N_{DR, t}+ I_{V}N_{V, t})} +\frac{Cov(r_{i, t}, -N_{DR, t-1})}{Var(N_{CF, t} - N_{DR, t}+ I_{V}N_{V, t})}</script>
<script type="math/tex; mode=display">\widehat{\beta}_{i, V} = \frac{Cov(r_{i, t}, N_{V, t})}{Var(N_{CF, t} - N_{DR, t}+ I_{V}N_{V, t})} + \frac{Cov(r_{i, t}, N_{V, t-1})}{Var(N_{CF, t} - N_{DR, t}+ I_{V}N_{V, t})}</script>
<p>where <script type="math/tex">I_V = 1</script> if we are using three beta model; else, <script type="math/tex">I_V = 0</script>. According to Campbell, they include one lag of the market’s news terms in the numerator because during the early sample period, not all stocks in their test-asset portfolios were traded frequently and synchronously.</p>
<h1 id="data-description">Data Description</h1>
<p>In our VAR economy, we have six state variables: the excess market return (measured as the log excess return on the Center for Research in Security Prices (CRSP) S&P value- weighted index over 30-day treasury bills); the expected market volatility, estimated using a lagged regression model; the log yield of the 30-day treasury bill; the market’s smoothed price-earning ratio (measured as the log ratio of the S&P 500 price index to a ten-year moving average of S&P 500 earnings from Schiller’s website); the default credit spread (the log yield spread between the Moody’s BAA bonds and AAA bonds, obtained from the Federal Reserve Bank of St. Louis, Missouri); and the small-stock value spread (measured as the difference between the log book-to-market ratios of small value and small growth stocks).</p>
<p>Campbell et al. (2018) argued that the stochastic volatility on the market makes a significant contribution to explain the cross-sectional risk premia of asset prices. They introduced the expected market volatility term (EVOL), which is meant to capture the variance of the market returns, <script type="math/tex">\sigma_t^2</script>, conditional on information available at time <script type="math/tex">t</script>. To construct the <script type="math/tex">EVOL_t</script>, we first create a series of the within-monthly realized variance of daily returns for each time <script type="math/tex">t</script>, <script type="math/tex">RVOL_t</script>. We run a regression of <script type="math/tex">RVOL_{t+1}</script> on lagged realized variance as well as the rest of state variables at time <script type="math/tex">t</script> The predicted value from this regression is defined as the expected market variance (<script type="math/tex">EVOL_t\equiv \widehat{RVOL}_{t+1}</script>). Campbell et al. (2018) also included the default spread (DEF) due to the fact that it is known to track time-series variation in expected real returns on the market portfolio and shocks to the DEF should reflect news about aggregate default probabilities. Lastly, the value spread is shown by Brennan, Wang, and Xia (2004) to be predictable about future market returns. Table I reports the descriptive sample statistics on the monthly VAR state variable data between 1929:06 and 2018:12.</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image1.png" />
</p>
<h1 id="results">Results</h1>
<h2 id="var-estimates">VAR Estimates</h2>
<p>Table II and IV report parameter estimates for the two-beta and three-beta VAR model respectively over our full sample period June 1929 to June 2018. In the two-beta model, we exclude the expected market volatility to keep it consistent with Campbell and Vuolteenaho (2004) formulation; in the three-beta model, all state variables are used. Ordinary least squares (OLS) standard errors are reported in the parentheses below the coefficients. Finally, we report the <script type="math/tex">R^2</script> for each regression. In both tables, they show the lagged market excess return together with other state variables provide some ability to predict the future market return; the term yield spread and the small value spread have a consistent coefficient as the results in Campbell and Vuolteenaho (2004). Campbell et al. (2018) shows the expected volatility and the default credit spread positively predict the excess return. Our default credit spread has a negative sign in the three-beta VAR economy but insignificant. We can argue that the market volatility tends to widen the credit spread so the EVOL have explained the majority of the default spread, so the negative contribution from DEF provides some offsetting effect. The smoothed price-earning ratio shows a negative predictive power. Overall, about 2 percent <script type="math/tex">R^2</script> of both forecasting models is reasonable for monthly return model.</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image2.png" />
</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image4.png" />
</p>
<p>Table III and Table V summarize the behaviors of implied news from cash flow and discount rate from the two-beta model, and implied news from cash flow, discount rate and volatility from the three-beta model respectively. Our news from cash flow has a lower variance than the one of the news from discount rate, which agrees with the finding of Campbell (1991) that discount-rate news is the dominant component of the market return.</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image5.png" />
</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image6.png" />
</p>
<p>Figure 1 and Figure 2 illustrate the VAR model’s view of stock market history related to NBER recessions. The shaded area indicates the amber recession alert period. Our results are similar to Campbell’s in the common period (1963-2001); during the 2008 financial crisis, we see both news about discount rate and cash flow dropped significantly. Campbell and Vuolteenaho (2004) calls this as “mixed recession” (i.e. the Great Depression). In particular, the news of volatility peaked substantially in 2008.</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image3.png" />
</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image7.png" />
</p>
<h2 id="betas">Betas</h2>
<p>Table VI reports beta estimates for the 25 size- and book-to-market portfolios over the 1963-2018 period under both the two-beta (Panel A) and three-beta (Panel B) ICAPMs. The portfolios are organized in a square matrix with growth stocks at the left, value stocks at the right, small stocks at the top and large stocks at the bottom. For each panel, the first matrix displays the cash-flow betas, and the second matrix displays the discount-rate betas. In Panel B, the last matrix is the estimated betas for the news about volatility. First of all, It is not a surprise to see the cash flow betas are significantly lower than the discount rate betas because cash-flow news is from the VAR economies with limited forecastability. In Panel B, since the excess return should include a volatility term (Jensen’s inequality term) as a precautionary saving motives, the negative beta estimates align with the idea that a hedging premium is paid for the news on volatility. For both ICAPMs, the small stocks still show higher betas in both discount rate and cash flow. Moreover, the value stocks show lower discount-rate betas. However, the value stocks have lower cash-flow betas than the growth stocks. This suggests that the two innovations co-move in the modern period and two-beta model may have trouble with disentangling the size-value puzzle. Based on the three-beta ICAPM, the volatility betas from our estimates are consistent with the story from the earlier sample in Campbell et al. (2018). We may explain the lower hedging premium required for both small and value stocks give rise of the size-and-value anomalies in the modern period.</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image9.png" />
</p>
<h2 id="empirical-estimates-of-risk-premia">Empirical Estimates of Risk Premia</h2>
<p>Lastly, with the estimated betas, we can run our cross-sectional regressions</p>
<script type="math/tex; mode=display">\bar{R}_i^e = g_1\widehat{\beta}_{i, CF} + g_2\widehat{\beta}_{i, DR} + e_i</script>
<script type="math/tex; mode=display">\bar{R}_i^e = g_1\widehat{\beta}_{i, CF} + g_2\widehat{\beta}_{i, DR} + g_3\widehat{\beta}_{i, V} + e_i</script>
<p>respectively, where <script type="math/tex">\bar{R}_{i}^e\equiv \bar{R}_i -\bar{R}_{rf}</script> denotes the sample average simple excess return on asset <script type="math/tex">i</script>. The implied risk-aversion coefficient can be recovered by <script type="math/tex">g_1/g_2</script>. Table VII shows that traditional CAPM and two-beta ICAPM explains the cross-sectional variations fairly. In the earlier subsection, it suggests that the innovations from the cash-flow and discount rate co-moves in the majority of the updated period (2001-2018), especially during the financial crisis, so we can expect the two-beta ICAPM’s performance falls off. Encouragingly, the three-beta ICAPMs explains more than 30% of variation in the cross-section average excess returns. The two-beta ICAPM has implied risk aversion of 12.31, while the three-beta ICAPM shows 5.25. A visual summary of these results is provided in Figure 3, where we multiply 1,200 on the estimate to show annualized return percentage point.</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image10.png" />
</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image8.png" />
</p>
<h2 id="price-some-popular-equity-and-currency-strategies">Price Some Popular Equity and Currency Strategies</h2>
<p>In this section, we use our ICAPM model to assess some common anomalies that have been discussed in the asset pricing literature. Table VIII analyzes a number of well-known equity anomalies using data taken from Kenneth French’s website. The sample period is 1963:07-2018:12. The anomaly portfolios are the market (RMRF), size (SMB) and value (HML) factors discussed in this paper, the profitability (RMW) and investment (CMA) added in Fama and French (2015), the momentum (UMD) factor of Carhart (1997), short-term reversal (STR) and long-term reversal (LTR) factors. We also include the currency carry trade long-short portfolio strategies proposed in Lustig, Roussanov, and Verdelhan (2011) with all countries and developed countries only. For each of these portfolios, Table VIII reports the mean excess return in the first column and the standard deviation of return in the second column. Column 3-5 report the portfolios’ betas with our estimates of discount-rate news, cash-flow news, and volatility news. These are used in Column 6-9 to construct the components of fitted excess returns based on discount-rate news (<script type="math/tex">\lambda_{DR}</script>), cash-flow news in the two-beta ICAPM (<script type="math/tex">\lambda^{2−BETA}_{CF}</script>), cash-flow news in the three-beta ICAPM (<script type="math/tex">\lambda^{3−BETA}_{CF}</script>), and the variance news in the three-beta ICAPM (<script type="math/tex">\lambda_V</script>). These fitted excess returns use the parameter estimates of the two-beta and three-beta models reported in Table VII. We do not reestimate any parameters so we can think this analysis as an out-of-sample test. Columns 10-12 report the alphas of the anomalies (their sample average excess returns less their predicted excess returns) calculated using the CAPM, the two-beta ICAPM, and the three-beta ICAPM. All the portfolios, with the obvious exception of RMRF, have been chosen to have positive CAPM alphas.</p>
<p>Table VIII shows that the volatility risk exposure explains the majority of abnormal returns. Most of the anomaly portfolios have negative betas on the volatility news, which make them riskier and help to explain their positive excess returns. The exceptions are SMB, RMW and CMA portfolios. Here, the two beta model is not necessarily better than the CAPM, but it does explain well on the developed country carry trade return. Against the two-beta ICAPM, the three-beta ICAPM shows much more promising interpretation of the value and size portfolio returns. Interestingly, the three-beta model shows the better result in explaining the carry trade return with all countries. It indicates that the carry premium arises from the volatility news in the US equity markets. Overall, the three-beta model gives more accurate prediction than the CAPM and the two-beta ICAPM.</p>
<p align="center">
<img src="/assets/images/icapm_evidence/image11.png" />
</p>
<h1 id="conclusions">Conclusions</h1>
<p>Empirically, we find the value stocks and small stocks have considerably lower insur- ance premium to hedge future volatility shock, and this can explain their higher average returns. However, we cannot find clear higher cash-flow betas in both two-beta and three- beta ICAPMs. Chen and Zhao (2009) argued news about cash flow is computed residually relative to the news about the discount rate, so the cash-flow news captures all the noise in the VAR model. If a variable is omitted from zt, but it belongs to rational investors’ infor- mation set, it ends up in the innovation of the cash-flow news. Therefore, the cash-flow news is really sensitive to the VAR specification. They showed that value companies do not have larger cash-flow betas with alternative VAR specification. This is consistent with our beta estimates. Hence, the disagreement on cash-flow betas may stem from the misspecification of the VAR economies.</p>
<h1 id="references">References</h1>
<ul>
<li>Brennan, Michael J., Ashley W. Wang, and Yihong Xia, 2004, Estimation and test of a simple model of intertemporal capital asset pricing, The Journal of Finance 59, 1743–1776.</li>
<li>Campbell, John, 1993, Intertemporal asset pricing without consumption data, American Economic Review 83, 487–512.</li>
<li>Campbell, John Y., 1991, A variance decomposition for stock returns, The Economic Journal 101, 157–179.</li>
<li>Campbell, John Y., Stefano Giglio, Christopher Polk, and Robert Turley, 2018, An intertem- poral capm with stochastic volatility, Journal of Financial Economics 128, 207–233.</li>
<li>Campbell, John Y., and Robert J. Shiller, 1988, The dividend-price ratio and expectations of future dividends and discount factors, The Review of Financial Studies 1, 195–228.</li>
<li>Campbell, John Y., and Tuomo Vuolteenaho, 2004, Bad Beta, Good Beta, American Eco- nomic Review 94, 1249–1275.</li>
<li>Carhart, Mark M., 1997, On persistence in mutual fund performance, The Journal of Finance 52, 57–82.</li>
<li>Chen, Long, and Xinlei Zhao, 2009, Return decomposition, The Review of Financial Studies 22, 5213–5249.</li>
<li>Epstein, Larry G., and Stanley E. Zin, 1989, Substitution, risk aversion, and the temporal behavior of consumption and asset returns: A theoretical framework, Econometrica 57, 937–969.</li>
<li>Fama, Eugene F., and Kenneth R. French, 1995, Size and book-to-market factors in earnings and returns, The Journal of Finance 50, 131–155.</li>
<li>Fama, Eugene F., and Kenneth R. French, 2015, Dissecting Anomalies with a Five-Factor Model, The Review of Financial Studies 29, 69–103.</li>
<li>Lintner, John, 1965, The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets, The Review of Economics and Statistics 47, 13–37.</li>
<li>Lustig, Hanno, Nikolai Roussanov, and Adrien Verdelhan, 2011, Common Risk Factors in Currency Markets, The Review of Financial Studies 24, 3731–3777.</li>
<li>Merton, Robert, 1973, An intertemporal capital asset pricing model, Econometrica 41, 867– 87.</li>
<li>Sharpe, William F., 1964, Capital asset prices: A theory of market equilibrium under con- ditions of risk, The Journal of Finance 19, 425–442.</li>
</ul>projohnewzhangThis paper replicates the return decomposition methodologies proposed by Campbell and Vuolteenaho (2004) and Campbell, Giglio, Polk, and Turley (2018), which explained the size and value “anomalies” in stock returns. We show that due to the co-movement of the cash-flow and discount risk in the modern period, there is little evidence that the value stocks have significant higher betas than the growth stocks; however, the news about expected future market volatility is able to disentangle the size-and-value puzzle effectively. Empirically, small stocks and value stocks require less hedging premium than the large and growth stocks. Lastly, their methodology shows some ability to price the abnormal returns of popular equity strategies and currency carry trade premiums.Tutorial: WRDS Data Access Via Python API2018-10-01T00:00:00+00:002018-10-01T00:00:00+00:00https://wizardkingz.github.io/wrdsdataaccesspython-tutorial<h2 id="prerequisites">Prerequisites</h2>
<ul>
<li>An active account with the Warton WRDS Web Service</li>
<li>Python 2.7+ or 3.0+</li>
<li><code class="highlighter-rouge">wrds</code> Python library; if you don’t have it, install it (<code class="highlighter-rouge">pip install wrds</code>)</li>
</ul>
<h2 id="setup">Setup</h2>
<h3 id="to-create-pgpass-file">To create .pgpass file</h3>
<p>For Mac user, you can create the <code class="highlighter-rouge">.pgpass</code> under <code class="highlighter-rouge">/Users/username</code>.</p>
<pre><code class="language-unix">wrds-pgdata.wharton.upenn.edu:9737:wrds:your_username:your_password
</code></pre>
<p>where your_username is your WRDS username and your_password is your WRDS password.</p>
<h4 id="to-restrict-file-permissions">To restrict file permissions:</h4>
<p><code class="highlighter-rouge">chmod 600 ~/.pgpass</code></p>
<h2 id="getting-started">Getting Started</h2>
<p>Now, you have set up your WRDS profile and can access your WRDS account through the Python API.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">wrds</span>
<span class="c">## provide your username </span>
<span class="n">db</span> <span class="o">=</span> <span class="n">wrds</span><span class="o">.</span><span class="n">Connection</span><span class="p">(</span><span class="n">wrds_username</span><span class="o">=</span><span class="s">'username'</span><span class="p">)</span>
<span class="c">## Here is an example to get DOW daily index value.</span>
<span class="n">db</span><span class="o">.</span><span class="n">raw_sql</span><span class="p">(</span><span class="s">'SELECT date,dji FROM djones.djdaily limit 10'</span><span class="p">)</span>
<span class="c">## Get data from CRSP</span>
<span class="n">crsp</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">raw_sql</span><span class="p">(</span><span class="s">'select cusip,permno,date,bidlo,askhi from crsp.dsf LIMIT 100'</span><span class="p">)</span>
</code></pre></div></div>johnewzhangPrerequisitesLibrary: Portfolio Optimization2018-09-27T21:10:00+00:002018-09-27T21:10:00+00:00https://wizardkingz.github.io/portfolio-optimization<p>As an ongoing effort to provide more finance-related python library, I will start with the portfolio optimization library. This page documents the Hello-World version.</p>
<h1 id="installation">Installation</h1>
<p>If you have python 3.6+ installed, you can run the following in your terminal</p>
<pre><code class="language-unix">pip install git+https://github.com/WizardKingZ/portfolio_optimization.git
</code></pre>
<p>It is easy to uninstall it.</p>
<pre><code class="language-unix">pip uninstall portfolio_optimization
</code></pre>
<h1 id="get-started">Get Started</h1>
<h2 id="unconstrained-portfolio">Unconstrained Portfolio</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">## import MarkowitzPortfolio </span>
<span class="kn">from</span> <span class="nn">portfolio_optimization</span> <span class="kn">import</span> <span class="n">MarkowitzPortfolio</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="c">## Load the Fama-French Five Factor Dataset to calculate annualized return and covariance</span>
<span class="n">rts</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.11</span><span class="p">,</span> <span class="mf">0.07</span><span class="p">,</span> <span class="mf">0.09</span><span class="p">,</span> <span class="mf">0.08</span><span class="p">,</span> <span class="mf">0.08</span><span class="p">])</span>
<span class="n">cov</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span> <span class="mf">0.0242</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0015</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0025</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0019</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0033</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">0.0015</span><span class="p">,</span> <span class="mf">0.0067</span><span class="p">,</span> <span class="mf">0.0004</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0013</span> <span class="p">,</span> <span class="mf">0.0001</span> <span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">0.0025</span><span class="p">,</span> <span class="mf">0.0004</span><span class="p">,</span> <span class="mf">0.0063</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0002</span><span class="p">,</span> <span class="mf">0.0025</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">0.0019</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0013</span> <span class="p">,</span> <span class="o">-</span><span class="mf">0.0002</span><span class="p">,</span> <span class="mf">0.0033</span><span class="p">,</span> <span class="mf">0.0001</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">0.0033</span><span class="p">,</span> <span class="mf">0.0001</span> <span class="p">,</span> <span class="mf">0.0025</span><span class="p">,</span> <span class="mf">0.0001</span><span class="p">,</span> <span class="mf">0.0033</span><span class="p">]])</span>
<span class="n">ffFactorNames</span> <span class="o">=</span> <span class="p">[</span><span class="s">'Market'</span><span class="p">,</span> <span class="s">'SMB'</span><span class="p">,</span> <span class="s">'HML'</span><span class="p">,</span> <span class="s">'RMW'</span><span class="p">,</span> <span class="s">'CMA'</span><span class="p">]</span>
<span class="c">## Initialize the MarkowitzPortfolio with the expected return, covariance and asset names</span>
<span class="c">## both cov and rts are numpy arrays. rts is a row vector </span>
<span class="c">## ffFactorNames should be a list</span>
<span class="n">port</span> <span class="o">=</span> <span class="n">MarkowitzPortfolio</span><span class="p">(</span><span class="n">rts</span><span class="p">,</span> <span class="n">cov</span><span class="p">,</span> <span class="n">ffFactorNames</span><span class="p">,</span> <span class="n">riskFreeRate</span><span class="o">=</span><span class="mf">0.046</span><span class="p">)</span>
<span class="c">## set the target return as 0.1 </span>
<span class="c">## you can set risk aversion as well</span>
<span class="c">## we will cover constraints optimization later</span>
<span class="n">configuration</span> <span class="o">=</span> <span class="p">{</span><span class="s">'constraints'</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
<span class="s">'riskAversion'</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
<span class="s">'targetReturn'</span><span class="p">:</span> <span class="mf">0.1</span><span class="p">}</span>
<span class="c">## obtain weights allocated on each asset</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">port</span><span class="o">.</span><span class="n">get_allocations</span><span class="p">(</span><span class="n">configuration</span><span class="o">=</span><span class="n">configuration</span><span class="p">)</span>
<span class="c">## plot the efficient frontier and capital market line </span>
<span class="n">port</span><span class="o">.</span><span class="n">display_efficient_frontier</span><span class="p">(</span><span class="n">assetsAnnotation</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">specialPortfolioAnnotation</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">addTangencyLine</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">upper_bound</span><span class="o">=</span><span class="mf">0.14</span><span class="p">)</span>
</code></pre></div></div>
<p align="center">
<img src="/assets/images/portfolio_optimization/efficient_frontier_with_cml.png" />
</p>
<h2 id="constrained-portfolio">Constrained Portfolio</h2>
<p>Let’s continue to use the FF-5 Factor data. However we are interested in a portfolio with the following constraints</p>
<p>\begin{array}{c|cc}
\text{Asset} & \text{Lower Bound %} & \text{Upper Bound %}\\
\hline \text{MKT} & 0 & 100 \\
\text{HML} & 50 & 100\\
\text{SMB} & 0 & 100 \\
\text{RMW} & 0 & 100 \\
\text{CMA} & -50 & 100 \\
\end{array}</p>
<p>We can set up the following constraints</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">constraints</span> <span class="o">=</span> <span class="p">{</span><span class="s">'Market'</span><span class="p">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="s">'HML'</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="s">'SMB'</span><span class="p">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="s">'RMW'</span><span class="p">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="s">'CMA'</span><span class="p">:</span> <span class="p">[</span><span class="o">-.</span><span class="mi">5</span><span class="p">,</span> <span class="mi">1</span><span class="p">]}</span>
<span class="n">configuration</span><span class="p">[</span><span class="s">'constraints'</span><span class="p">]</span> <span class="o">=</span> <span class="n">constraints</span>
<span class="c">## obtain weights allocated on each asset</span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">port</span><span class="o">.</span><span class="n">get_allocations</span><span class="p">(</span><span class="n">configuration</span><span class="o">=</span><span class="n">configuration</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="black-litterman-portfolio-allocation">Black-Litterman Portfolio Allocation</h2>
<p>Investors are interested in incorporating views, when they are solving asset allocation problems. For example,</p>
<p>\begin{array}{c|cc}
\text{View} & \text{Confidence Level %} & \text{Plus and Minus %}\\
\hline \text{Market outperforms by 1%} & 95 & 5 \\
\text{SMB beats RMW by 1%} & 90 & 5\\
\text{A portfolio of 20% of SMB and 80% of HML beats RMW by 1%} & 99 & 1\\
\end{array}</p>
<p>We can use <code class="highlighter-rouge">BlackLittermanPortfolio</code> to incorporate these views</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">portfolio_optimization</span> <span class="kn">import</span> <span class="n">BlackLittermanPortfolio</span>
<span class="c">## assetNames are ['Market', 'HML', 'SMB', 'RMW', 'CMA']</span>
<span class="n">port</span> <span class="o">=</span> <span class="n">BlackLittermanPortfolio</span><span class="p">(</span><span class="n">modeledReturn</span><span class="p">,</span> <span class="n">modeledCovariance</span><span class="p">,</span> <span class="n">assetNames</span><span class="p">,</span> <span class="n">riskFreeRate</span><span class="o">=</span><span class="mf">0.046</span><span class="p">)</span>
<span class="n">views</span> <span class="o">=</span> <span class="p">{</span><span class="s">'Market'</span><span class="p">:</span> <span class="p">{</span><span class="s">'type'</span><span class="p">:</span> <span class="s">'absolute'</span><span class="p">,</span> <span class="s">'scale'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s">'confidence'</span><span class="p">:</span> <span class="mi">95</span><span class="p">,</span> <span class="s">'plusminus</span><span class="si">%</span><span class="s">'</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="s">'SMB|RMW'</span> <span class="p">:</span> <span class="p">{</span><span class="s">'type'</span><span class="p">:</span> <span class="s">'relative'</span><span class="p">,</span> <span class="s">'scale'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s">'confidence'</span><span class="p">:</span> <span class="mi">90</span><span class="p">,</span> <span class="s">'plusminus</span><span class="si">%</span><span class="s">'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="s">'weights'</span><span class="p">:</span> <span class="p">[</span><span class="mf">1.</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.</span><span class="p">]},</span>
<span class="s">'SMB|HML|RMW'</span> <span class="p">:</span> <span class="p">{</span><span class="s">'type'</span><span class="p">:</span> <span class="s">'relative'</span><span class="p">,</span> <span class="s">'scale'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s">'confidence'</span><span class="p">:</span> <span class="mi">99</span><span class="p">,</span> <span class="s">'plusminus</span><span class="si">%</span><span class="s">'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s">'weights'</span><span class="p">:</span> <span class="p">[</span><span class="o">.</span><span class="mi">2</span><span class="p">,</span> <span class="o">.</span><span class="mi">8</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.</span><span class="p">]},</span>
<span class="p">}</span>
<span class="c">## second paramter is the R^2 used in estimating your modeled expected returns</span>
<span class="n">port</span><span class="o">.</span><span class="n">update_views</span><span class="p">(</span><span class="n">views</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">)</span>
<span class="c">## Let's use the same parameters</span>
<span class="n">configuration</span> <span class="o">=</span> <span class="p">{</span><span class="s">'constraints'</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
<span class="s">'riskAversion'</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
<span class="s">'targetReturn'</span><span class="p">:</span> <span class="mf">0.1</span><span class="p">}</span>
<span class="c">## you can obtain the BL weights </span>
<span class="n">weights</span> <span class="o">=</span> <span class="n">port</span><span class="o">.</span><span class="n">get_allocations</span><span class="p">(</span><span class="n">configuration</span><span class="o">=</span><span class="n">configuration</span><span class="p">)</span>
<span class="c">## you can reset the portfolio to become a regular Markowitz portfolio</span>
<span class="n">port</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span>
</code></pre></div></div>
<p>As we mentioned the <script type="math/tex">R^2</script> of the modeled returns, we will introduce some factor modeling tools in the later iteration. For beginners, let’s consider the widely used factor models, Captial Asset Pricing Model (CAPM). In general, if you use the CAPM to estimate returns, the <script type="math/tex">R^2</script> typically is below 15%. Hence, the parameter in the <code class="highlighter-rouge">update_view</code> member function would be typically set below 15% (e.g. we set at 10%).</p>
<hr />
<p>I will add more functionality overtime. In the next iteration, I will include the following</p>
<ul>
<li>Hansen-Jaganathan bound test</li>
<li>Factor Models for estimating Stochastic Discount Factors</li>
<li>Style Analysis</li>
</ul>projohnewzhangAs an ongoing effort to provide more finance-related python library, I will start with the portfolio optimization library. This page documents the Hello-World version.Research: The Role of Institutional Investor Prior 19982018-09-24T21:20:00+00:002018-09-24T21:20:00+00:00https://wizardkingz.github.io/role-institutional-investor-prior-1998<p>We study the role of institutional investors who invest in public traded US companies using WRDS dataset through 1980 to 1997. We find that all institutional investors dislike high volatile stocks. During this period, there is little evidence that active monitoring was done by the independent institutional investors and significantly it shows grey institutions did not intervene with the management of the companies they invested in.</p>
<hr />
<h1 id="introduction">Introduction</h1>
<p>Asset under management by professional investors is growing fast nowadays. For example, hedge fund industry alone is managing 3 trillion US dollars by the end of 2017. Ferreira, Matos (2008) examined the role of institutional investors globally in the firms they invested in during 2000 and 2005. They are interested in whether large investors are effective in influencing corporate management and boards towards creating shareholder value. Independent investors such as hedge fund and mutual fund were found to play a more active monitoring role, while the grey institutions such as bank and insurance companies, who have business relationships with firms, have a tendency to agree with current management. Additionally, they found all institutions favor larger firms than smaller firms. In this paper, we provide a similar analysis about the role of institutional investors on the public traded non-financial US companies prior 1998.</p>
<p>In the first section, we discuss the data deriving our analysis. The next two sections describe the determinants of institutional ownership and the role of institutional owners on firm’s value, performance, and capital expenditure. Lastly, we provide the conclusion and suggestions for future investigations.</p>
<h1 id="data-description">Data Description</h1>
<p>Our analysis uses data from CRSP, COMPUSTAT and COMPUSTAT_company. CRSP includes the adjusted stock return (RET), volatility (SIMGA) and institutional ownership information. Due to a severe mapping errors, the institutional ownership information is only correct prior 1998 and this is primarily the constraints for the analysis done in this paper. There are five types of ownership, banks (IOB), insurance companies (IOIC), mutual funds (IOMF), independent investment advisors (IOIA) and others (including pension fund and university endowment fund; IOO). We define banks, insurance companies and others to be grey institutions (IOGREY), and mutual funds and independent investment advisors to be independent institutions (IOINDEP). Our financial statement level data is from COMPUSTAT and COMPUSTAT_company. COMPUSTAT_company provides the industry classifications (DNUM). In our analysis, we remove the financial industry (DNUM between 6000 and 6999) since financial institutions are already complicated investors and should have no impact by institutional ownerships. Table I summarizes the firm characteristics, institutional ownership and operating performance between 1980 and 1997.</p>
<p align="center">
<img src="/assets/images/role_of_institutional_investor_prior_1998/image1.png" />
</p>
<h1 id="the-determinants-of-institutional-ownership">The Determinants of Institutional Ownership</h1>
<p>We investigate the determinants of institutional ownership similar to the Ferreira, Matos (2008). In their paper, the firm size factor is a strong indicator of ownership preference by all type of investors during 2000 and 2005, while prior 1998, our analysis (Table II) shows that investment opportunity and the stock volatility contributes significantly negative towards the fraction of institutional ownership. In other words, growth companies are likely not to be included in the portfolio of institutional investors in the 80s and 90s. Given at the time, the interest rates were much higher than today so bond investment may be much safer than equities. This may lead to investors’ cautious view on growth company stocks. On the other hand, we can infer that institutions may invest more in value companies and even with active monitoring, the change in firms value or operating performance may not be so significant. This is shown in the next section. Additionally, in 80s and 90s, mutual funds were more passive and hedge fund started booming only after the 1996 SEC rule change on governing mutual fund. Much of the active monitoring may arise only in the 2000s.</p>
<p align="center">
<img src="/assets/images/role_of_institutional_investor_prior_1998/image2.png" />
</p>
<h1 id="institutional-ownership-and-firm-performance">Institutional Ownership and Firm Performance</h1>
<p>In this section, we discuss institutional ownership impact on firm value, operating performance and capital expenditure. For firm value, we are adopting the Toqin’s Q as suggested in Ferreira, Matos (2008) in order to get a comparison between the prior 1998 US market and 2000-2005 global performance. We estimate regressions of a firm’s Tobin’s Q using variables associated with firm value such as size, growth opportunities, leverage and cash holding. Here, we also include analyst coverage as an indicator of how popular a firm is. Our regression shows that the grey institution does not provide active monitoring but the evidence of active monitoring is week. This can be again credit to the smaller size of the independent money management industry so there were not enough variations to demonstrate the significance of active monitoring.</p>
<p>In addition, we includes the operating performance and capital expenditure analyses in Table III. In both of these analyses, we include book-to-market into part of regression since it often relates to firm performance measure and expenditure structure. Similarly, the return on assets provide the similar results to reinforce the finding. Lastly, due to the weakness of active monitoring, it is not surprised to see the insignificant of the effects on capital expenditure.</p>
<p align="center">
<img src="/assets/images/role_of_institutional_investor_prior_1998/image3.png" />
</p>
<h1 id="conclusions">Conclusions</h1>
<p>Our findings suggest that prior 1998, there are little evidence that independent institution are effective monitors while the grey institution are not providing active monitoring as their ownership has negative impact on the firm value and operating performance. We also document that all institutions share a preference of non-volatile firms prior 1980s as this may provide some reason of the insignificance of active monitoring. In other words, the more stable firms may not have to change their strategy much as suggested by mutual funds or hedge funds.</p>
<p>Lastly, if correct mapping data for institutional ownership is provided after 1998, we should see the significance of active monitoring from independent investors. For example, Brav, Jiang, Partnoy, Thomas (2008) shows some evidence of hedge fund activism on value creation.</p>
<hr />
<h2 id="varialble-definitions">Varialble Definitions</h2>
<p align="center">
<img src="/assets/images/role_of_institutional_investor_prior_1998/image4.png" />
</p>
<h2 id="references">References</h2>
<ul>
<li>Ferreira, M., Matos, P., 2008. The colors of investors’ money: The role of institutional investors around the world. Journal of Financial Economics 88, 499-533.</li>
<li>Brav, A., Jiang, W., Partnoy, F., Thomas, R., 2008. Hedge fund activism, corporate governance, and firm performance. Journal of Finance Vol LXIII, No. 4.</li>
</ul>projohnewzhangWe study the role of institutional investors who invest in public traded US companies using WRDS dataset through 1980 to 1997. We find that all institutional investors dislike high volatile stocks. During this period, there is little evidence that active monitoring was done by the independent institutional investors and significantly it shows grey institutions did not intervene with the management of the companies they invested in.Project: Multivariate Time Series Outlier Detection2018-09-24T21:10:00+00:002018-09-24T21:10:00+00:00https://wizardkingz.github.io/timeseries-outlier-detection<p>Let <script type="math/tex">x_t =(x_{1t}, \cdots, x_{kt})'</script> be a k-dimensional time series that follows a vector autoregressive integrated moving-average, ARIMA, model</p>
<script type="math/tex; mode=display">\Phi(B) x_t = c + \Theta(B)\varepsilon_t,</script>
<p>where</p>
<script type="math/tex; mode=display">\Phi(B) = I - \sum_{i=1}^p \Phi_iB^p, \Theta(B) = I - \sum_{i=1}^p \Theta_iB^p</script>
<p>are <script type="math/tex">k \times k</script> matrix polynomials of finite degrees <script type="math/tex">p</script> and <script type="math/tex">q</script>, <script type="math/tex">B</script> is the backshift operator such that <script type="math/tex">Bx_t = x_{t-1}</script>, <script type="math/tex">c</script> is a <script type="math/tex">k</script>-dimensional constant vector, and <script type="math/tex">\{\varepsilon = (\varepsilon_{1t}, \cdots, \varepsilon_{kt})'\}</script> is a sequence of independent and identically distributed Gaussian random vector with zero mean and positive-definite covariance matrix <script type="math/tex">\Sigma</script>. We assume that <script type="math/tex">\Phi(B)</script> and <script type="math/tex">\Theta(B)</script> are left caprice and that all of the zeros of the determinants <script type="math/tex">\vert\Phi(B)\vert</script> and <script type="math/tex">\vert\Theta(B)\vert</script> are on or outside the unit circle.</p>
<p>Define the autoregressive representation as</p>
<script type="math/tex; mode=display">\Pi(B)x_t = c_0 +\varepsilon_t</script>
<p>where</p>
<script type="math/tex; mode=display">\Pi(B) = I - \sum_{i=1}^{\infty} \Pi_i B^i = \{\Theta(B)\}^{-1}\Phi(B)</script>
<p>Define the moving-average representation as</p>
<script type="math/tex; mode=display">x_t = c^* +\Psi(B) \varepsilon_t</script>
<p>where</p>
<script type="math/tex; mode=display">\Psi(B) = I - \sum_{i=1}^{\infty} \Psi_i B^i = \{\Phi(B)\}^{-1}\Theta(B)</script>
<p>Before diving into the algorithm, we will make some more definitions. Denote the observed time series by <script type="math/tex">y_t =(y_{1t}, \cdots, y_{kt})'</script> and let <script type="math/tex">\omega = (\omega_1, \cdots, \omega_k)'</script> be the size of the initial impact of an outlier on the series <script type="math/tex">x_t</script>. The four types of univariate outlier can be generalized to the multivariate case</p>
<script type="math/tex; mode=display">y_t = x_t + \alpha(B) \omega \xi_t^{(h)},</script>
<p>where</p>
<p>\begin{array}{c|c}
\alpha(B) & \text{Type} \\
\hline \Psi (B) & \text{multivariate innovational outlier} \\
I & \text{multivariate additive outlier} \\
(I - B)^{-1} & \text{multivariate level shift} \\
{D(\delta)}^{-1} & \text{multivariate temporary change} \\
\end{array}</p>
<p>Here, <script type="math/tex">\Psi</script> is the MA representation of VARMA model. Next, we can multiply above equation by <script type="math/tex">\Pi(B)</script> and subtract a constant term from both sides, we have</p>
<script type="math/tex; mode=display">a_t = \varepsilon_t + \Pi(B)\alpha(B)\omega \xi_t^{(h)}</script>
<p>Here let’s write <script type="math/tex">\Pi(B)\alpha(B)</script> as <script type="math/tex">\Pi^*(B)</script>. Therefore, if we suppose <script type="math/tex">\hat{a}_t</script> is the estimated residuals and <script type="math/tex">\hat{\Pi}_i</script> is the estimated coefficients of the autoregressive representation, we have the following</p>
<script type="math/tex; mode=display">\hat{a}_t =\left(I - \sum_{i=1}^{\infty} \hat{\Pi^*}_i B^i\right)\xi_t^{(h)}\omega + \varepsilon_t = \left(\xi_t^{(h)} - \sum_{i=1}^{\infty} \hat{\Pi^*}_i \xi_{t-i}^{(h)}\right)\omega + \varepsilon_t</script>
<p>where <script type="math/tex">\varepsilon_t \sim N(0, \Sigma)</script>, and the estimator of <script type="math/tex">\omega</script> is</p>
<script type="math/tex; mode=display">\hat{\omega}_{i, h} = -\left(\sum_{i=0}^{n - h} \hat{\Pi^*}'_i \Sigma^{-1}\hat{\Pi^*}\right)^{-1}\sum_{i=0}^{n - h} \hat{\Pi^*}'_i \Sigma^{-1}\hat{a}_{h+i}</script>
<p>where <script type="math/tex">\Pi_0 = -I</script> and <script type="math/tex">i = I , A, L, T</script>.</p>
<p>For the <script type="math/tex">\Pi^*</script>, we can obtain by calculating the coefficient of <script type="math/tex">\Pi(B)\alpha(B)</script>. Here, additive outlier and level shift are a just special case of temporary change so let’s only consider a temporary change at the moment. That is the following</p>
<script type="math/tex; mode=display">I - \sum_{i=1}^{\infty} \Pi^*_i B^i = \left(I - \sum_{i=1}^{\infty} \Pi_i B^i)(I - \delta B\right)^{-1}</script>
<p>Hence, above is just a simple polynomial division. In general, suppose three polynomials</p>
<p><script type="math/tex">\Pi(B) = I - \sum_{i=1}^{\infty} \Pi_i B^i,</script> <script type="math/tex">\Phi(B) = I - \sum_{i=1}^{p} \Phi_i B^i</script> and <script type="math/tex">\Theta(B)= I - \sum_{i=1}^{q} \Theta_i B^i</script></p>
<p>where</p>
<script type="math/tex; mode=display">\Pi(B) = \Theta(B)^{-1}\Phi(B)</script>
<p>Then</p>
<script type="math/tex; mode=display">\Pi_i = \Phi_i + \sum_{j = 1}^{i} \Theta_j\Pi_{i-j}</script>
<p>where <script type="math/tex">i = j</script>, <script type="math/tex">\Pi_{i - j} = -I</script>, if <script type="math/tex">i > p</script>, <script type="math/tex">\Phi_i = 0</script> and if <script type="math/tex">j > q</script>, <script type="math/tex">\Theta_j = 0</script>. Therefore, use the above reclusive relation, we can obtain the polynomial representation for temporary change.</p>
<p>In addition, the covariance matrix of the estimator is <script type="math/tex">\Sigma_{i, h} = \left(\sum_{i=0}^{n-h} \hat{\Pi^*}'_i\Sigma^{-1}\hat{\Pi^*}_i\right)^{-1}</script>. Lastly, to reduce the effect of one particular outlier, we can just proceed to the following equation</p>
<script type="math/tex; mode=display">x_t = y_t - \alpha(B) \omega \xi_t^{(h)}</script>
<p>where <script type="math/tex">x_t</script> is the adjusted series and <script type="math/tex">y_t</script> is the original series.</p>
<h2 id="algorithm">Algorithm</h2>
<p>Suppose observed time series has no outlier and provide model <script type="math/tex">ARMA(p, q)</script></p>
<p><strong>Step 1</strong>: Use Maximum-Likelihood estimation to estimate <script type="math/tex">\Phi, \Theta</script> and <script type="math/tex">\Sigma</script>.</p>
<p><strong>Step 2</strong>: Use resulted estimation to calculate estimated residuals, <script type="math/tex">\hat{a}_t</script> and the estimated AR representation, <script type="math/tex">\hat{\Pi}_i</script></p>
<p><strong>Step 3</strong>: Obtain <script type="math/tex">\omega_{i, h}</script> for all <script type="math/tex">h</script> where the subscript <script type="math/tex">i</script> indicates type of outliers (<script type="math/tex">i = I, A, L, S</script>) and the covariance estimator.</p>
<p><strong>Step 4</strong>: To test the significance of a multivariate outlier at time index h, we consider the null hypothesis <script type="math/tex">H_0:\omega= 0</script> versus the alternative hypothesis <script type="math/tex">H_a: \omega\neq 0</script>. Two test statistics are used.</p>
<ul>
<li><script type="math/tex">J_{i,h} = \hat{\omega}_{i, h}'\Sigma_{i, h}^{-1}\hat{\omega}_{i, h}</script> where <script type="math/tex">i = I, A, L \text{ or } T</script>. (chi-square random variable with <script type="math/tex">k</script> degrees of freedom)</li>
<li>The maximum <script type="math/tex">z</script>-statistics</li>
</ul>
<script type="math/tex; mode=display">C_{i, h} =\frac{\max_{1 \leq j \leq k} \vert\hat{\omega}_{j, i, h}\vert}{\sqrt{\sigma_{j, i, h}}}</script>
<p>where <script type="math/tex">\hat{\omega}_{j, i, h}</script> and <script type="math/tex">\sigma_{j, i, h}</script> are the <script type="math/tex">n</script>th element of <script type="math/tex">\hat{\omega}_{i, h}</script> and the <script type="math/tex">(j, j)</script>th element of <script type="math/tex">\Sigma_{i, h}</script> respectively.</p>
<p>We define the overall test statistics as</p>
<script type="math/tex; mode=display">J_{\max}(i, h_i) = \max_h J_{i, h}, C_{\max}(i, h_{i}^*) = \max_h C_{i, h}, (i = I, A, L, T)</script>
<p>where <script type="math/tex">h_i</script> denotes the time index when the maximum of the test statistics <script type="math/tex">J_{i, h}</script> occurs and <script type="math/tex">h_{i}^*</script> denotes the time index when the maximum of <script type="math/tex">C_{i, h}</script> occurs.</p>
<p>In case of multiple significant joint test statistics, we identify the outlier type based on the test that has the smallest empirical p-value. If none of <script type="math/tex">J_{\max}</script> are significant, <script type="math/tex">C_{\max}</script> will be tested. Otherwise, our procedure end.</p>
<p><strong>Step 5</strong>: Once an outlier is detected, we will adjust the original time series by subtracting <script type="math/tex">\alpha(B)\omega \varepsilon_t^{(h)}</script> Then the process jumps back to Step 1.</p>
<hr />
<h1 id="code-walk-through">Code Walk Through</h1>
<p>One can download the “outlier_detection.R” code <a href="https://github.com/WizardKingZ/time_series_outlier_detection">here</a>. Before using this program, please make sure MTS library is also installed. We will use the gas furnace data as an example. First, let’s load the data</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rate</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gas.furnace</span><span class="o">$</span><span class="n">InputGasRate</span><span class="w">
</span><span class="n">co2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gas.furnace</span><span class="o">$</span><span class="n">CO2</span><span class="w">
</span></code></pre></div></div>
<p>Then we will use the following blocks to get the outlier</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">co2</span><span class="p">)</span><span class="w">
</span><span class="c1">## delta value for temporary change is normally set to be 0.7</span><span class="w">
</span><span class="n">delta</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0.7</span><span class="w">
</span><span class="n">xt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gas.furnace</span><span class="w">
</span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">()</span><span class="w">
</span><span class="n">it</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">out</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">outlierDetect</span><span class="p">(</span><span class="n">xt</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="n">q</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">delta</span><span class="p">,</span><span class="w"> </span><span class="n">critical.j</span><span class="p">,</span><span class="w"> </span><span class="n">critical.c</span><span class="p">)</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">out</span><span class="o">$</span><span class="n">Outlier</span><span class="p">[</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">break</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">()</span><span class="w">
</span><span class="n">l</span><span class="o">$</span><span class="n">Outlier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">out</span><span class="o">$</span><span class="n">Outlier</span><span class="w">
</span><span class="n">l</span><span class="o">$</span><span class="n">Jmax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">out</span><span class="o">$</span><span class="n">Jmax</span><span class="w">
</span><span class="n">l</span><span class="o">$</span><span class="n">Cmax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">out</span><span class="o">$</span><span class="n">Cmax</span><span class="w">
</span><span class="n">xt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">out</span><span class="o">$</span><span class="n">xt</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">l</span><span class="p">)</span><span class="w">
</span><span class="n">res</span><span class="p">[[</span><span class="n">it</span><span class="p">]]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">l</span><span class="w">
</span><span class="n">it</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## report returns a summary of outlier and its statistics for each individual iteration</span><span class="w">
</span><span class="c1">## each interation takes two rows</span><span class="w">
</span><span class="c1">## first row contains Jmax, Cmax value for each type of outlier and the position of the identified outlier</span><span class="w">
</span><span class="c1">## first 4 columns are for Jmax from type 1 to 4 and column 5 to 8 are for Cmax from type 1 to 4</span><span class="w">
</span><span class="c1">## the 9th column is the type of the identified outlier and the 10th is the time index</span><span class="w">
</span><span class="c1">## the second row contains time indexes for each Jmax and Cmax</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="n">report</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">res</span><span class="p">)</span><span class="o">*</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">res</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">[[</span><span class="n">i</span><span class="p">]]</span><span class="w">
</span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="m">2</span><span class="o">*</span><span class="n">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="n">report</span><span class="p">[</span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">item</span><span class="o">$</span><span class="n">Jmax</span><span class="p">[,</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">report</span><span class="p">[</span><span class="n">index</span><span class="m">+1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">item</span><span class="o">$</span><span class="n">Jmax</span><span class="p">[,</span><span class="w"> </span><span class="m">2</span><span class="p">]</span><span class="w">
</span><span class="n">report</span><span class="p">[</span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="o">:</span><span class="m">8</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">item</span><span class="o">$</span><span class="n">Cmax</span><span class="p">[,</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">report</span><span class="p">[</span><span class="n">index</span><span class="m">+1</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="o">:</span><span class="m">8</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">item</span><span class="o">$</span><span class="n">Cmax</span><span class="p">[,</span><span class="w"> </span><span class="m">2</span><span class="p">]</span><span class="w">
</span><span class="n">report</span><span class="p">[</span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="m">9</span><span class="o">:</span><span class="m">10</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="n">item</span><span class="o">$</span><span class="n">Outlier</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Finally, you can obtain the result as follow</p>
<p>\begin{array}{c|c}
\text{Time Index} & \text{Type} \\
\hline 43 & \text{MTC} \\
55 & \text{MTC} \\
265 & \text{MIO} \\
113 & \text{MTC} \\
199 & \text{MTC} \\
235 & \text{MLS} \\
91 & \text{MTC} \\
262 & \text{MIO} \\
288 & \text{MLS} \\
287 & \text{MLS} \\
82 & \text{MLS} \\
\end{array}</p>
<hr />
<h2 id="references">References</h2>
<ul>
<li>Tsay, R., Pena, D., Pankratz, A. E., 2000. Outliers in multivariate time series. Biometrika 87, 789-804.</li>
</ul>projohnewzhangLet be a k-dimensional time series that follows a vector autoregressive integrated moving-average, ARIMA, modelNotes: University of Waterloo Lecture Notes and Professional Exam Notes2018-09-22T22:00:00+00:002018-09-22T22:00:00+00:00https://wizardkingz.github.io/waterloo-lecturenotes<h1 id="lecture-notes">Lecture Notes</h1>
<h2 id="algebra">Algebra</h2>
<ul>
<li><a href="/assets/pdfs/Algebra Notes/Algebra Notes .pdf">Chapter 1: The Introduction to Abstract Algebra</a></li>
<li><a href="/assets/pdfs/Algebra Notes/Algebra Notes 2.pdf">Chapter 2: Modular Arithmetic</a></li>
<li><a href="/assets/pdfs/Algebra Notes/Algebra 3.pdf">Chapter 3: Cryptography</a></li>
<li><a href="/assets/pdfs/Algebra Notes/Algebra Note 4.pdf">Chapter 4: Multiplicative Functions</a></li>
<li><a href="/assets/pdfs/Algebra Notes/Algebra Notes 5.pdf">Chapter 5: Polynomials</a></li>
<li><a href="/assets/pdfs/Algebra Notes/Algebra Notes 6.pdf">Chapter 6: Modular Arithmetic for Polynomial</a></li>
<li><a href="/assets/pdfs/Algebra Notes/Complex Number .pdf">Chapter 7: Complex Number</a></li>
</ul>
<h2 id="linear-algebra">Linear Algebra</h2>
<ul>
<li><a href="/assets/pdfs/MATH146.pdf">Linear Algebra (Credit to Eric Langlois)</a></li>
<li><a href="/assets/pdfs/MATH245.pdf">Abstract Linear Algebra (Credit to Eric Langlois)</a></li>
</ul>
<h2 id="mathematical-analysis">Mathematical Analysis</h2>
<ul>
<li><a href="/assets/pdfs/MATH147.pdf">Calculus 1 (Credit to Eric Langlois)</a></li>
<li><a href="/assets/pdfs/MATH148.pdf">Calculus 2 (Credit to Eric Langlois)</a></li>
<li><a href="/assets/pdfs/MATH247.pdf">Introduction to Real Analysis (Credit to Eric Langlois)</a></li>
<li><a href="/assets/pdfs/PMATH_351_note.pdf">Real Analysis</a></li>
<li><a href="/assets/pdfs/PMATH 450.pdf">Lebesgue Integration and Fourier Analysis (Credit to ML Baker)</a></li>
<li><a href="/assets/pdfs/AMATH_350_notes.pdf">Differential Equations</a></li>
</ul>
<h2 id="combinatorics-and-optimization">Combinatorics and Optimization</h2>
<ul>
<li><a href="/assets/pdfs/MATH249.pdf">Introduction to Combinatorics (Credit to Eric Langlois)</a></li>
<li><a href="/assets/pdfs/CO_255_notes.pdf">Introduction to Optimization</a></li>
</ul>
<h2 id="computing">Computing</h2>
<ul>
<li><a href="/assets/pdfs/CS 241 Note.pdf">Compilers</a></li>
<li><a href="/assets/pdfs/AMATH242_notes.pdf">Numerical Computation</a></li>
</ul>
<h2 id="statistics">Statistics</h2>
<ul>
<li><a href="/assets/pdfs/STAT_330_notes.pdf">Mathematical Statistics</a></li>
<li><a href="/assets/pdfs/STAT_331_notes.pdf">Applied Linear Models</a></li>
<li><a href="/assets/pdfs/STAT_333_notes.pdf">Applied Probabilities</a></li>
<li><a href="/assets/pdfs/STAT_443_notes.pdf">Forecasting</a></li>
</ul>
<h2 id="mathematical-finance">Mathematical Finance</h2>
<ul>
<li><a href="/assets/pdfs/ACTSC372_notes.pdf">Finance Theory</a></li>
<li><a href="/assets/pdfs/ACTSC_446_notes.pdf">Option Pricing</a></li>
<li><a href="/assets/pdfs/ACTSC_331_notes.pdf">Life Contingency</a></li>
</ul>
<hr />
<h1 id="actuarial-exam-notes">Actuarial Exam Notes</h1>
<ul>
<li><a href="/assets/pdfs/Study_notes_for_MFE.pdf">Exam MFE</a></li>
<li><a href="/assets/pdfs/Study_notes_for_C.pdf">Exam C</a></li>
<li><a href="/assets/pdfs/exam_s_study_note.pdf">Exam S: Statistical Models</a></li>
</ul>johnewzhangLecture NotesNotes: Columbia Finance and Economics PhD Course Lecture Notes2018-09-22T21:20:00+00:002018-09-22T21:20:00+00:00https://wizardkingz.github.io/PhDFinanceNotes<h1 id="lecture-notes">Lecture Notes</h1>
<h2 id="finance-and-economics">Finance and Economics</h2>
<ul>
<li><a href="/assets/pdfs/Finance Theory.pdf">Finance Theory</a></li>
<li><a href="/assets/pdfs/Microeconomics Theory.pdf">Microeconomics Theory</a></li>
<li><a href="/assets/pdfs/Empirical Asset Pricing.pdf">Empirical Asset Pricing</a></li>
<li><a href="/assets/pdfs/Market Microstructure.pdf">Market Microstructure</a></li>
</ul>
<h2 id="econometrics">Econometrics</h2>
<ul>
<li><a href="/assets/pdfs/Introduction to Financial Econometrics.pdf">Introduction to Econometrics</a></li>
<li><a href="/assets/pdfs/Finanical Econometrics-Time Series.pdf">Time Series</a></li>
</ul>johnewzhangLecture NotesProject: Variational Autoencoder2018-09-22T21:10:00+00:002018-09-22T21:10:00+00:00https://wizardkingz.github.io/VAE<p>As images can be considered as realizations drawn from a latent variable model, we are implementing a variational autoencoder using neural networks as the variational family to approximate the Bayesian representation. Unlike the other parametric distribution, neural networks can approximate arbitrary distribution reasonably well. In this project, we are also interested in examining the effectiveness of such encoders on the SVHN dataset. By comparing different architectures, we hope to understand how the dimension of the latent space affects the learned representation and visualize the learned manifold for low dimensional latent representations. Lastly, we will do a comparison among different variational autoencoders.</p>
<hr />
<h1 id="introduction">Introduction</h1>
<p>In this article, we will discuss the methodology used to produce the variational autoencoder based on Kingma, Welling (2014) and explore how the model performs on the MNIST and SVHN dataset. At first, it is worthwhile to summarize how the proposed methodology works and what is the intuition behind it. Before, discussing the methodology, let’s define some basic notation used in the rest of this document.</p>
<ul>
<li><script type="math/tex">X</script> is the dataset we are interested in. Since we are working mostly on image dataset, we will call <script type="math/tex">X</script> the image data.</li>
<li><script type="math/tex">z</script> is the latent state variable.</li>
<li><script type="math/tex">p_{\phi}(z\vert X)</script> is the target distribution of the latent state space.</li>
<li><script type="math/tex">q_{\theta}(z\vert X)</script> is the variational family for the latent state space.</li>
</ul>
<p>The proposed variational autoencoder is constructed on the premise that the image data is generated by some hidden features (i.e. the latent state variables). In addition, we believe the latent features for a specific set of images (e.g. a set of dog pictures) are sampled based on a prior distribution of <script type="math/tex">z</script>. The following figure shows the idea.</p>
<p align="center">
<img src="/assets/images/vae/image1.png" />
</p>
<p>We are interested in modeling the target distribution of the latent state space given the data <script type="math/tex">X</script>, <script type="math/tex">p_{\phi}(z\vert X)</script>. However, typically this distribution is not tractable so we are using variational inference proposed in the paper. Here especially, we are performing inference using the (Kullback-Leibler) KL divergence metrics. In the rest of the report, we will discuss the methodology used in the estimation, present some results on the MNIST and SVHN dataset using different architectures, and link some potential improvements on the autoencoder.</p>
<h1 id="methodology">Methodology</h1>
<p>As discussed in the introduction, we need to do variational inference on the target distribution <script type="math/tex">p_{\phi}(z\vert X)</script>. In other words, we need to pick <script type="math/tex">q_{\theta}(z \vert X)</script> from a variational family that minimize the KL divergence metric,</p>
<script type="math/tex; mode=display">\min D_{KL}(q_{\theta}(z\vert X)\|p_{\phi}(z\vert X))</script>
<p>Based on Doersch (2016), we can have the following relationship for our KL metrics</p>
<script type="math/tex; mode=display">\log p(X) - D_{KL}(q_{\theta}(z\vert X)\|p_{\phi}(z\vert X)) = E_q[\log p_{\phi}(X\vert z)] - D_{KL}(q_{\theta}(z\vert X)\|p(z))</script>
<p>This is the core of the variational autoencoder used in the report. Since <script type="math/tex">D_{KL}(q_{\theta}(z \vert X)\|p_{\phi}(z\vert X))</script> is non-negative and, in general, needs to be minimized, the right-hand side acts as the lower bound of the log-likelihood function of the marginal image distribution. Let’s denote it as</p>
<script type="math/tex; mode=display">\mathcal{L}(X, \phi, \theta) = E_q[\log p_{\phi}(X \vert z)] - D_{KL}(q_{\theta}(z\vert X)\|p(z))</script>
<p>Intuitively, we can maximize the log-likelihood function, that is, maximize the <script type="math/tex">\mathcal{L}(X, \phi, \theta)</script>. <script type="math/tex">\mathcal{L}(X, \phi, \theta)</script> consists of two parts, the reconstruction error and the KL divergence of the approximate posterior from the prior.</p>
<p>One can use Monte-Carlo EM method to estimate the <script type="math/tex">\theta</script> and <script type="math/tex">\phi</script> but it can be slow. Here, we will use the stochastic gradient descent through the backpropagation in a neural network. In order to do this, we have to use a reparameterization trick to provide inputs <script type="math/tex">z</script> for <script type="math/tex">p_{\phi}(X\vert z)</script>. Now consider two networks, the autoencoding network (encoder) and the reconstruction network (decoder). The encoder takes a set of image and outputs its hidden features; the decoder consumes some hidden features sampled from the prior distribution of the latent state variables and generates parameters in the distribution <script type="math/tex">p_{\phi}(X\vert z)</script>. Kingma, Welling (2014) says the sampled input to the decoder, <script type="math/tex">\mathbf{z}</script> can be expressed as a deterministic variable <script type="math/tex">g_{\theta}(x, \varepsilon)</script>, where <script type="math/tex">\varepsilon</script> is called an auxiliary variable. For example, in the univariate Gaussian case, <script type="math/tex">\mathbf{z} = \mu + \sigma \varepsilon</script>, where <script type="math/tex">\varepsilon \sim N(0, 1)</script>.</p>
<h2 id="gaussian-encoder">Gaussian Encoder</h2>
<p>Typically, people use Gaussian as the variational inference family for Encoder distribution. In addition, we can also assume the prior distribution of the latent state model is Gaussian. Doersch (2016) derived the KL divergence of two Gaussian as following</p>
<script type="math/tex; mode=display">\frac{1}{2}\left(tr(\Sigma\left(X, \theta)^{-1}\Sigma\right)+(\mu(X, \theta) - \mu)'\Sigma(X, \theta)(\mu(X, \theta) - \mu) - k + \log\left(\frac{\vert\Sigma(X, \theta)\vert}{\vert\Sigma\vert}\right)\right)</script>
<p>where <script type="math/tex">k</script> is the dimension of the number of latent features.</p>
<p>If we assume the prior distribution of the latent state model is a standard multivariate Gaussian, then the KL divergence is</p>
<script type="math/tex; mode=display">\frac{1}{2}\left(tr(\Sigma(X, \theta)) + \mu(X, \theta)'\mu(X, \theta) - k - \log\vert\Sigma(X, \theta)\vert\right)</script>
<p>For the reparameterization part, we can sample from <script type="math/tex">N(\mu(X, \theta), \Sigma(X, \theta))</script> by first sampling <script type="math/tex">\varepsilon \sim N(0, I)</script> and then compute <script type="math/tex">\mathbf{z} = \mu(X, \theta) + \Sigma(X, \theta)^{1/2}\varepsilon</script>. In particular, since the prior distribution of <script type="math/tex">z</script> is standard spherical Gaussian, the posterior is close to have a diagonal covariance matrix. Hence we can simplify the computation to be</p>
<script type="math/tex; mode=display">\mathbf{z} = \mu(X, \theta) + \sqrt{tr(\Sigma(X, \theta))}\varepsilon</script>
<p>Lastly, the distribution of the decoder can be chosen as either Bernoulli or Gaussian. For Gaussian, we use a diagonal matrix for its covariance matrix.</p>
<h2 id="empirical-analysis">Empirical Analysis</h2>
<p>In this section, we will discuss the details of our experiment on the MNIST and SVHN datasets respectively. As the variational autoencoder performance on MNIST dataset is generally well behaved, we will investigate it first. Then we will discuss the effectiveness of the variational autoencoder on SVHN dataset.</p>
<h3 id="mnist-data">MNIST Data</h3>
<p>In the first experiment on the MNIST dataset, we used a simple fully connected neural network for Gaussian encoder and Bernoulli decoder for various dimensionalities of the latent space. In Figure 1, we have an average result from the generative decoder based on the sample data shown in 1a. As the dimension increases in the latent space, we see a significant improvement on the decoder. In the later SVHN model, we will observe a similar behavior. In the 2 dimensional space, it appears only 0, 1, 3, 8 and 9 are recognized. In 1c, 6 and 7 get recognized. In 1d and 1e, all digits can be represented by the decoder.</p>
<p align="center">
<img src="/assets/images/vae/image2.png" />
</p>
<p>In Figure 2, 2a shows the space learned by our autoencoder. It is consistent from what we see in the 2D case in Figure 1b. We know that the MNIST images are only <script type="math/tex">28\times 28</script> so it may not seem to bad to use low dimensional latent space to represent it and using a Bernoulli decoder makes sense here because the images are only black and white. Now, when we move from MNIST to SVHN, the input size is getting increased to <script type="math/tex">32 \times 32\times 3</script> (RGB images). With the complexity of images, we may not get as good representation under low dimensional settings. In the SVHN data section, we will discuss the architectures we have looked at and how they do.</p>
<p align="center">
<img src="/assets/images/vae/image3.png" />
</p>
<h3 id="svhn-data">SVHN Data</h3>
<p>It is natural to start from the model used in the MNIST case and then progress towards more complicated architectures. Hence, we will start our experiment from fully connected networks to a convolutional network.</p>
<p align="center">
<img src="/assets/images/vae/image4.png" />
</p>
<h4 id="experiments-with-bernoulli-decoder-and-fully-connected-neural-network">Experiments with Bernoulli Decoder and Fully Connected Neural Network</h4>
<p>Although the SVHN images are RGB images, pixel information is saved as between 0 and 1. It is convenient that the matplotlib library can render them into colored images. Therefore, we can still model this as a probability in Bernoulli distribution but the representation is the relative color. Since the input size of the SVHN data is a lot larger than the MNIST, we add additional layers within the encoder and decoder as an effort to gain more complexity. First, we start with the two dimensional manifold learned from the SVHN images (Figure 2b). The autoencoder seems to be able to pick number 8 but all the images learned are homogeneous other than the shades. In short, the feature of shades and number 8 are captured by the encoder. This is a decent representation.</p>
<p>Next, we increase the dimensions of the latent space to 5, 20, 50 and 100 (the same for the rest of the discussions for SVHN). In Figure 3, 3b may validate the patterns we have seen in the low dimensional case are 8s. As the dimension raises, more colors and numbers are classified but they are still far from the sample data. In particular, the color green has not been perceived by the learner. Maybe we should try something different.</p>
<p align="center">
<img src="/assets/images/vae/image5.png" />
</p>
<p align="center">
<img src="/assets/images/vae/image6.png" />
</p>
<p>Convolutional neural networks are good at doing image processing and computer vision projects tend to use it to do image classifications. Hence, it can be a worthwhile thing to try. In all our examples below, we are only using one convolutional layer on the encoder side but not the decoder. In general, we may have decoders as mirror images of the encoder so many use deconvolutional layers to output the generated images. In addition, the training time for convolutional neural networks is typically quite long without GPU and more computing power. For simplicity, we are one convolutional layer with 16 activation maps to process the images.</p>
<h4 id="experiments-with-bernoulli-decoder-and-convolutional-neural-network">Experiments with Bernoulli Decoder and Convolutional Neural Network</h4>
<p>Let’s continue with the Bernoulli decoder. In Figure 2c, the learned manifold looks similar to 2b. In addition, the color blue is also included. This can be considered as the feature of the color spectrum. While increasing the dimensions, we do observe the vast improvement on generated images in Figure 4. At 100D latent space, the generated images are almost the same as the sample data.</p>
<h4 id="experiments-with-gaussian-decoder-and-convolutional-neural-network">Experiments with Gaussian Decoder and Convolutional Neural Network</h4>
<p>Lastly, since RGB is not a probability, Gaussian decoders may be more appropriate than Bernoulli decoders. From Figure 5, under the same neural network architecture, the performance is similar to the Bernoulli version so there might not be an advantage of using Gaussian. However, one issue for Gaussian is that it can produce negative means and we have to apply sigmoid functions on the means so they are forced to be between 0 and 1.</p>
<p>Interestingly, in Figure 2d, the learned manifold with 2 latent features under the Gaussian assumption produce more separation between light and dark colors, and the digit 8 is more apparent. It is arguable that the autoencoder with a Gaussian decoder may learn faster than the ones with Bernoulli.</p>
<h1 id="conclusion">Conclusion</h1>
<p>The variational autoencoder based on Kingma, Welling (2014) can learn the SVHN dataset well enough using Convolutional neural networks. As more latent features are considered in the images, the better the performance of the autoencoders is. Lastly, a Gaussian decoder may be better than Bernoulli decoder working with colored images.</p>
<hr />
<p>My code is available <a href="https://github.com/WizardKingZ/variational_autoencoder">here</a>.</p>
<h2 id="references">References</h2>
<ul>
<li>[stat.ML] Diederik P. Kingma and Max Welling, Auto-Encoding Variational Bayes, 2014.</li>
<li>[stat.ML] Carl Doersch, Tutorial on Variational Autoencoders, 2016.</li>
</ul>projohnewzhangAs images can be considered as realizations drawn from a latent variable model, we are implementing a variational autoencoder using neural networks as the variational family to approximate the Bayesian representation. Unlike the other parametric distribution, neural networks can approximate arbitrary distribution reasonably well. In this project, we are also interested in examining the effectiveness of such encoders on the SVHN dataset. By comparing different architectures, we hope to understand how the dimension of the latent space affects the learned representation and visualize the learned manifold for low dimensional latent representations. Lastly, we will do a comparison among different variational autoencoders.Welcome!2018-09-22T21:10:00+00:002018-09-22T21:10:00+00:00https://wizardkingz.github.io/welcome<p>Welcome to my personal website! I plan to update this site regularly throughout my time as a M.S. student at Columbia University, where I am focusing on financial economics. I hope to not only post about the progress of my financial economics research, but to also explore other ideas at the forefront of machine learning and make them more accessible, whether that be in the form of paper summaries or Python tutorials. Look for the website to pick up more content in the upcoming months and years.</p>johnewzhangWelcome to my personal website! I plan to update this site regularly throughout my time as a M.S. student at Columbia University, where I am focusing on financial economics. I hope to not only post about the progress of my financial economics research, but to also explore other ideas at the forefront of machine learning and make them more accessible, whether that be in the form of paper summaries or Python tutorials. Look for the website to pick up more content in the upcoming months and years.