Forecasting Japanese inflation with a news-based leading indicator of economic activities

Keiichi Goshima; Hiroshi Ishijima; Mototsugu Shintani; Hiroki Yamamoto

doi:10.1515/snde-2019-0117

Publicly Available Published by De Gruyter September 18, 2020

Forecasting Japanese inflation with a news-based leading indicator of economic activities

Keiichi Goshima , Hiroshi Ishijima , Mototsugu Shintani and Hiroki Yamamoto

From the journal Studies in Nonlinear Dynamics & Econometrics

https://doi.org/10.1515/snde-2019-0117

Abstract

We construct business cycle indexes based on the daily Japanese newspaper articles and estimate the Phillips curve model to forecast inflation at a daily frequency. We find that the news-based leading indicator, constructed from the topic on future economic conditions, is useful in forecasting the inflation rate in Japan.

Keywords: nowcasting; out-of-sample forecasting; Phillips curve; text classification

1 Introduction

With increasing attention to monitoring current inflation in real time, inflation series are now available at a daily frequency. For example, the Federal Reserve Bank of Cleveland has been providing a daily nowcasting series of the personal consumption expenditures (PCE) and the consumer price index (CPI), while the Billion Prices Project at MIT (Cavallo and Rigobon 2016) uses prices collected from hundreds of online retailers around the world on a daily basis to measure inflation. In Japan, the Nikkei CPINow data, which is constructed from scanner or Point-of-Sale (POS) data, can be viewed as a daily inflation series. Stock and Watson (1999) have emphasized the advantage of using a real economic activity index in forecasting future inflation at a monthly frequency. Their forecasting model is motivated by the Phillips curve, which describes the positive correlation between inflation and real economic activity. In order to examine the usefulness of a Phillips curve model in forecasting inflation at a daily frequency, however, a daily index series of real economic activity needs to be constructed.

In this paper, we first extract text information from the daily newspaper articles, the Nikkei, from 1989 to 2017, in constructing an index of real economic activities in Japan at a daily frequency. We then investigate whether the Phillips curve model with the daily news-based business cycle index could improve over the univariate benchmark model in forecasting the daily inflation series (the Nikkei CPINow) in Japan. In general, one may also use a more direct approach to exploring the linear or non-linear relationship between the target inflation series and the relevant text data, rather than summarizing all the text information in a single indicator of economic activity. However, identifying the source of forecast improvement is difficult in such an approach since forecasts are generated inside the black box. In contrast, our approach, based on the Phillips curve, has an advantage in the sense that its outcome can be interpreted through standard economic theory.^[1]

The idea of constructing the business cycle index from text data is not new. For example, Shapiro, Sudhof, and Wilson (2018) develop a US news sentiment index, based on 16 major US newspapers from 1980 to 2015, which is found to be strongly correlated with the state of contemporaneous business cycles. In a similar vein, by combining quarterly GDP data and the daily news topic variables in mixed-frequency dynamic factor models, business cycle indexes are constructed by Thorsrud (2018) for Norway and by Larsen and Thorsrud (2018) for the US, Japan and Europe (the euro area), respectively. While we share the motivation behind these studies, our methodology of constructing business cycle indexes has several distinct features that diverge from their analysis.

In economic applications of sentiment analysis, sentiment indexes are typically computed by using pre-defined dictionaries. Indexes are then investigated to explain the outcome of economic activities, such as the financial performance of the firm and asset prices.^[2] Instead of relying on sentiment dictionaries, we take advantage of our unique training data set, the Economy Watchers Survey, which is composed of a five-point rating scale through which the Japanese workforce assesses the conditions of the Japanese economy and also provides an itemized short description of the reasons for their assessment choices. We utilize a machine learning method and train a text classification model using the former (quantitative data) as output and the latter (text data) as input. Shapiro, Sudhof, and Wilson (2018) do not consider only the dictionary-based approach but also the machine learning approach in their analysis. However, their model is trained by a corpus with emotional labels obtained from a social network website, and thus the outcome may not be directly related to economic conditions. In contrast, our training data set is based on a government survey directly questioning people who are engaged in the regional economic activities, including retail, food and beverage, services, housing, manufacturing, non-manufacturing, and employment categories. Therefore, this corpus has a more specialized economic vocabulary than the more general corpus. In addition, our training data set is based on a public survey conducted by the Cabinet Office of the Japanese government, and it is therefore easily accessible to any researcher.^[3]

Since the goal of our analysis is to forecast future inflation using the index of real economic activities, it is important to determine if the sentences in the Nikkei newspaper articles are referring to current or future economic conditions. In the Economy Watchers Survey, our training data set, respondents of the Japanese workforce, are asked to evaluate the current economic conditions in comparison to the conditions of the past three-month mark, according to the following categories: Worse, Slightly worse, Unchanged, Slightly better, and Better. These correspond to scores of 0, 0.25, 0.5, 0.75 and 1.^[4] We can directly use these scores to learn a text classification model if one is interested in the analysis of the contemporaneous economic conditions (the learner 1 in the main text). Fortunately, in addition to the current status, the survey also requests a two to three-month prognosis of the Japanese economy from the following categories: Will get worse, Will get slightly worse, Will remain unchanged, Will get slightly better, and Will get better. Respondents are asked to provide separate text descriptions of their evaluation reasons for current and future economic conditions. We make use of this survey structure and consider the supervised learning of topics to determine whether the sentences describe either current or future economic conditions. Our use of supervised learning to estimate the sentence topic is in contrast with the analyses of Thorsrud (2018) and Larsen and Thorsrud (2018) that are based on the Latent Dirichlet Allocation (LDA) model, one of the most popularly used unsupervised topic models. For our purposes, supervised learning seems more appropriate since the topic of our interest, namely, future economic conditions, is difficult to discover by means of unsupervised topic model such as LDA.^[5] Once the topic is learned (the learner 3 in the main text), we can compute the topic probability of each sentence in the Nikkei newspaper articles being a description of future economic conditions. We then use this probability as a weight on the scores obtained from the text classification model for the five-point rating scale of future economic conditions (the learner 2 in the main text). We refer to the resulting business cycle index, the News-based Leading Indicator and use this index for the purpose of forecasting inflation.^[6]

To evaluate the empirical performance of the inflation forecasting at daily frequency, by the Phillips curve combined with the news-based leading indicator, we use a simulated out-of-sample forecasting methodology in which competing models are repeatedly estimated in each period to compute forecast errors. We use an autoregressive (AR) model as a benchmark and investigate whether mean squared errors (MSEs) of the Phillips curve forecast are less than those of the benchmark forecast. As our forecasting models of interest are nested, we use an out-of-sample F-test statistic proposed by McCracken (2007) to test the null hypothesis of equal predictive accuracy in nested models. We find that our News-based Leading Indicator can help forecast the future inflation rate in Japan at a daily frequency.

The paper is organized as follows. Section 2 elaborates on the methodology of constructing our business cycle indexes in detail. Section 3 develops a forecasting model for daily inflation and presents the main results. Section 4 concludes.

2 Building news-based business cycle indexes

2.1 Text data: the Nikkei and the Economy Watchers Survey

To construct the daily index of real economic activities, we utilize daily news articles in the Nikkei, the leading economic newspaper in Japan, with a circulation of 2.8 million as of June 2018. The Nikkei is a newspaper that specializes in financial, business, and industry sectors. We use news articles, from both morning and evening papers from April 1, 1989 to December 31, 2017. Table 1 shows the summary statistics of our text data from the Nikkei. Our main data consists of about 3.8 million articles with more than 30 million sentences in total. All texts in our data are written in Japanese. Unlike English texts, Japanese texts do not have spaces between words. Hence, we split sentences into words in advance using MeCab, which is a morphological analysis library for Japanese texts.^[7] After we divide articles into individual sentences, we estimate the news sentiment score on a sentence basis because our training data set, the Economy Watchers Survey, provides annotated scores on a sentence basis.

Table 1:

Summary statistics of Nikkei.


The number of news articles	3,809,207
The number of sentences	31,797,881
The number of unique words	946,480
The number of total words	812,403,912

In what follows, we use a machine learning method to learn text classification models from our training data set for the purpose of building two types of news-based business cycle indexes, called the News-based Coincident Indicator (hereafter, NCI) and the News-based Leading Indicator (hereafter, NLI). NCI and NLI are designed to capture, on a daily basis, current and future economic conditions, respectively. We suppose that the sentences in the Nikkei on Japanese economic activities have the same (and time-invariant) structure as those in the Economy Watchers Survey. In other words, from the view point of natural language processing (NLP), the domain of our training data set, the Economy Watchers Survey, is same as that of our input data of the Nikkei. Figure 1 shows an outline of the procedure we use to construct our business cycle indexes. In the first step, we train text classification models from the Economy Watchers Survey with supervised learning. In the second step, we estimate the sentiment scores of sentences in the Nikkei articles using trained models. In the last step, we aggregate news sentiment scores to build news-based business cycle indexes.

Figure 1:

Outline of the procedure.

To train our model, we utilize descriptions for the assessment of the economy in the Economy Watchers Survey, which is published by Cabinet Office of the government of Japan.^[8] The purpose of this monthly survey is to promptly grasp current and future economic conditions by consulting people involved in regional economy activity. The assessment of current conditions refers to the direction of change in the economic conditions compared to previous conditions from three months before on a five-point scale. Similarly, the assessment of future conditions refers to the expected direction of change within the next two or three months on a five-point scale. In the Economy Watchers Survey, an assessment by each respondent is accompanied by his or her description. Our supervised learning utilizes descriptions as input data and the assessments as output labels. Table 2 shows selected example descriptions of the training data set from the Economy Watchers Survey. We use the survey data from January 2011 to June 2018. The number of total sentences in descriptions for the assessment of current and future economic conditions are 112,214 and 125,055, respectively. Table 3 shows the breakdown of descriptions in the training data set. Descriptions for the assessment of current economic conditions tend to use the present tense and the present progressive tense. On the other hand, descriptions for the assessment of future economic conditions include auxiliary verbs related to the future tense, such as “will,” and words to anticipate the future, such as “expect.”

Table 2:

Example descriptions from Economy Watchers Survey.

(a) Assessment of current economic conditions

Assessment (score)	Description
Better (1)	The number of switches from dispatched workers to regular workers is increasing.
Slightly better (0.75)	Financial demand is slightly increasing.
Unchanged (0.5)	It was good at the beginning of the year, but it is now slowing down in latter half of the month.
Slightly worse (0.25)	It becomes slightly worse due to the effect of the new president of the United States.
Worse (0)	Our orders decrease after our special busy season.

(b) Assessment of future economic conditions

Assessment (score)	Description
Will get better (1)	It will get better thanks to a combination of end of the fiscal year sale and a high season of moving.
Will get slightly better (0.75)	We expect good outcomes as reservations for the new product in March are going well.
Will remain unchanged (0.5)	We see no additional orders from existing customers and expect no improvement after the new fiscal year.
Will get slightly worse (0.25)	We believe the economy will improve as foreign political situations become stabilized.
Will get worse (0)	It will get worse due to the effect of the new president of the United States.

Table 3:

Breakdown of descriptions in the training data set.

(a) Assessment of current economic conditions

Assessment (score)	The number of sentences
Better (1)	2,284
Slightly better (0.75)	24,927
Unchanged (0.5)	53,352
Slightly worse (0.25)	25,098
Worse (0)	6,553
Total	112,214
The number of unique words	23,444
The number of total words	3,933,373

(b) Assessment of future economic conditions

Assessment (score)	The number of sentences
Will get better (1)	2,488
Will get slightly better (0.75)	29,383
Will remain unchanged (0.5)	62,683
Will get slightly worse (0.25)	24,184
Will get worse (0)	6,317
Total	125,055
The number of unique words	23,628
The number of total words	4,120,109

2.2 Text classification model

In economic applications of sentiment analysis, the two most frequently used approaches are the dictionary-based approach and the machine learning approach. The advantage of the dictionary-based approach is its tractability and an easy implementation. The method quantifies text sentiments by simply counting the number of positive and negative words using a pre-defined dictionary. This approach, however, can fail to judge whether a sentence has good or bad information because word meaning often depends on the context. For example, the sentence, “The firm’s performance is not good,” can wrongly be taken as a good signal on economic conditions because it contains the word “good.” Therefore, it takes considerable time to incorporate all the possible patterns into the pre-defined dictionary. In addition, no established Japanese sentiment dictionary specializing in economic fields is available in the literature. Unlike the dictionary-based approach, the machine learning approach automatically recognizes patterns from the text data (input) and annotated scores (output). Among machine learning methods, neural networks in recent years have achieved high performance in text classification tasks. They excel in utilizing syntax information, such as sequence alignment and word co-occurrence.

Over the years, various neural network-based models have been developed in the field of computer science, including the recurrent neural network (Chung et al. 2014), the recursive neural network (Socher et al. 2013), the convolutional neural network (CNN, Kim 2014; Zhang, Zhao, and LeCun 2015) and the self-attention network (Lin et al. 2017). In our analysis, we use a text classification model based on neural networks called fastText, which was developed by Joulin et al. (2017).^[9] According to Joulin et al. (2017), fastText is a simple and computationally efficient network architecture that at the same time performs as well as classifiers based on other neural network models, such as the CNN and long short-term memory (LSTM, Hochreiter and Schmidhuber 1997) in terms of accuracy. Joulin et al. (2017) compared the classification accuracy of test data sets using eight corpora in comparison with six models, including three (multinomial) logistic regression-based models and three neural network-based models: the character-level CNN (char-CNN) of Zhang, Zhao, and LeCun (2015), the character-level convolutional recurrent neural network (char-CRNN) of Xiao and Cho (2016) and the very deep convolutional neural network (VDCNN) of Conneau et al. (2016). In the panel (a) of Table 4, we summarize the performance of fastText and other competing models from the experiment of Joulin et al. (2017). In addition, we also use our training data set, the Economy Watchers Survey, and compare the binary classification accuracy of fastText with those of LSTM, CNN, char-CNN and support vector machine (SVM). In this experiment, the binary classification task is classifying texts into two topic classes, current and future economic conditions. The results of our own experiment in terms of the accuracy of the test data set are reported in panel (b) of Table 4.^[10] Based on the results of experiments conducted by Joulin et al. (2017) and by ourselves, it seems fair to say that the performance of fastText is comparable to alternative machine learning models.

Table 4:

Relative performance of fastText and other classifiers.

(a) Average accuracy of test data set from Joulin et al. (2017)

Model	Accuracy
(Multinomial) logistic regression with bag-of-words	80.30%
(Multinomial) logistic regression with N-grams	81.80%
(Multinomial) logistic regression with N-grams and TF-IDF	81.36%
Char-CNN	82.81%
Char-CRNN	83.31%
VDCNN	84.91%
fastText with unigram	82.09%
fastText with unigram and bigram	84.33%

(b) Accuracy of test data set using the Economy Watchers Survey

Model	Accuracy
LSTM	86.38%
CNN	86.17%
Char-CNN	85.90%
Linear Kernel SVM with bag-of-words	83.87%
fastText with unigram	84.54%
fastText with unigram and bigram	86.30%

Note: Table 4(a) summarizes fastText’s performances from Table 1 of Joulin et al. (2017).

Finally, in terms of computation time, Joulin et al. (2017) report that training and evaluation of sentiment analysis data sets using fastText are many orders of magnitude faster than char-CNN and VDCNN. In our own experiment with the Economy Watchers Survey, we also confirm that fastText can be trained much more quickly than other models. Therefore, the method seems to be effective in assigning sentiment scores to our large-scale news article data set without losing much of accuracy.

2.3 Estimation and aggregation

In order to compute NCI and NLI from sentences in the Nikkei, we use three types of learners based on fastText, two of which are regression learners and the other, a classification learner. The first learner (learner 1) is trained from the assessment of the current economy on a five-point scale and its descriptions from the training data set, the Economy Watchers Survey. This learner assigns continuous sentiment scores, Score-CI, to sentences in the Nikkei. The second learner (learner 2) is trained from the assessment of the future economy on a five-point scale and its descriptions. This learner assigns continuous sentiment scores, Score-LI. Higher Score-CI and Score-LI indicate that texts contain information about better current and future economic conditions, respectively. The third learner (learner 3) is trained using assessments of both the current and future economic conditions as well as their descriptions. This learner assigns topic probabilities, W, to sentences in the Nikkei. We regard outputs from a sigmoid function as topic probabilities. The topic probability W takes a value near zero if a sentence is similar to descriptions for the assessment of the current economy, and takes a value near one if a sentence is similar to descriptions for the assessment of the future economy. Table 5 summarizes three types of our learners.

Table 5:

Summary of learners.

	Method	Training data set	Output
Learner 1	Regression	Descriptions of current economy	Values ([0, 1])
Learner 2	Regression	Descriptions of future economy	Values ([0, 1])
Learner 3	Classification	Both descriptions	Two classes ({current, future})

We give sentiment scores and topic probabilities to all sentences in news articles with three learners.^[11] When a trained model assigns a sentiment score to each sentence in news articles, out-of-vocabulary words (words or tokens not included in a training data set) are replaced by a common special character, such as <UNKNOWN>. Sentences in news articles longer than the longest sentence in the training data set are truncated. In general, even if exactly the same neural network model is employed, slightly different outputs can be obtained on each run depending on initialization. Therefore, we train the models 10 times, and use average scores from all cases.

In the next step, we construct two news-based business cycle indexes using the average of scores weighted by the topic probabilities. In particular, NCI and NLI are respectively defined by

(1) N C I t = ∑ k = 1 N t S c o r e − C I t , k * ( 1 − W t , k ) ∑ k = 1 N t ( 1 − W t , k ) ,

(2) N L I t = ∑ k = 1 N t S c o r e − L I t , k * W t , k ∑ k = 1 N t W t , k ,

where N _t is the number of sentences in day t, Score-CI _t,k is a score of kth sentence assigned by learner 1 in day t, Score-LI _t,k is a score of kth sentence assigned by learner 2 in day t and W _t,k is a topic probability of kth sentence assigned by learner 3 in day t. Figure 2 plots our constructed news-based business cycle indexes, NCI and NLI. Table 6 shows their summary statistics. Overall, two indicators tend to comove. However, there are some differences between the two, reflecting the fact that the NCI captures the current economic conditions while the NLI captures future economic conditions. In particular, the median of the NLI seems to be higher than that of the NCI, which may indicate that the typical newspaper article tends to be relatively optimistic about future economic conditions. This tendency becomes clearer during the post-financial crisis period.

Figure 2:

News-based business cycle indexes.

Table 6:

Summary statistics of news-based business cycle indexes.

	NCI	NLI
Mean	49.34	50.15
S.D.	0.40	0.39
Median	49.34	50.16
Min	46.67	48.17
Max	50.92	52.43
Obs	10,474	10,474

Note: The unit is percent. The sample period is April 1, 1989 to December 31, 2017.

For the purpose of comparing our new business cycle indexes with other indicators of economic activity, we converted the daily series into a monthly series by using monthly averages. Table 7 reports the correlations of our business cycle indexes and official business cycle indicators, namely two diffusion indexes (DIs) from the summary results of the Economy Watchers Survey and three composite indexes (CIs) of business conditions from the Economic and Social Research Institute (ESRI) of the Cabinet Office. Both our business cycle indexes, the NCI and the NLI, turn out to be highly correlated with the current and future DIs in the Economy Watchers Survey. This outcome may not be very surprising given the fact that our indexes are calculated from the same survey information as the DIs in the Economy Watchers Survey. However, among three official CIs, namely, (i) the leading index, (ii) the coincident index, and (iii) the lagging index, the NLI is highly correlated with the leading index, which is computed without using newspaper articles or the Economy Watchers Survey. Figure 3 plots NCI, NLI, and ESRI’s leading index along with official recession episodes. The figure shows that both the NCI and the NLI tend to decrease during economic downturns. In particular, all indexes clearly dropped during the financial crisis of 2008. Overall, it seems fair to say that our news-based business cycle indexes capture well the business cycle properties of the Japanese economy.^[12] In the following sections, we mainly focus on using the NLI in our simulated out-of-sample forecasting exercise of daily inflation series.

Table 7:

Correlations of news-based business cycle indexes and other official business cycle indicators.

	Economy Watchers Survey		ESRI’s CI of business conditions
	Current DI	Future DI	Leading index	Coincident index	Lagging index
NCI	0.774	0.768	0.202	0.326	0.317
NLI	0.550	0.713	0.682	0.386	−0.145

Note: Current diffusion index (DI) and future DI are from the Economy Watchers Survey, Cabinet Office. The leading index, coincident index and lagging index are from composite indexes (CIs) of business conditions by ESRI, the Cabinet Office.

Figure 3:

News-based business cycle indexes and official business cycle index.

Note: All the series are normalized to have zero mean and unit variance. news-based coincident indicator (NCI) and news-based leading indicator (NLI) are news-based business cycle indexes. The leading index is from the composite indexes of business conditions by economic and social research institute (ESRI), the Cabinet Office. The shaded area shows the official recession episodes of ESRI, the Cabinet Office.

3 Forecasting performance of news-based leading indicator

3.1 Phillips curve inflation forecast

Motivated by the well-known Phillips curve model, which describes the correlation between the inflation and unemployment rate, Stock and Watson (1999) conducted a simulated out-of-sample forecasting analysis of US inflation at the 12-month horizon. They claimed that the inflation forecast could be improved by replacing the unemployment rate with a single real economic activity index, especially with one constructed from 168 economic indicators. Atkeson and Ohanian (2001) reviewed the literature of the 1990s and challenged the belief that the conventional Phillips curve model is useful tool for inflation forecasting. They revisited the forecast performance of the Phillips curve model adopted in Stock and Watson (1999) and claimed that their naive forecasts outperformed the Phillips curve forecast. Later, Stock and Watson (2009) provided a comprehensive literature review on inflation forecasting and pointed out that the results against the Phillips curve forecast obtained by Atkeson and Ohanian (2001) could disappear, depending on the forecast horizon and sample period.^[13] In summary, it seems fair to say that no consensus has been reached regarding the validity of the Phillips curve relationship in forecasting inflation. However, almost all the existing studies utilized monthly or quarterly data to examine the performance of the Phillips curve inflation forecast. Here, we use the daily data to evaluate the usefulness of the Phillips curve in forecasting Japanese inflation.

In our analysis, we use a Phillips curve model similar to the one considered by Stock and Watson (1999), who replaced unemployment rate with a single real economic activity index. In particular, we consider the daily Japanese inflation series called Nikkei CPINow and investigate its relationship with our news-based business cycle index. Nikkei CPINow is known for the first real time price index in Japan. NOWCAST, Inc. releases two CPINow series called the CPINow-T index and the CPINow-S index. The CPINow-T index is a daily series calculated from POS data and has been available since April 1, 1989. More than 800 stores and 300,000 different products such as food and daily necessities items are covered in the survey. Unlike the official CPI, shares of the items are taken into account everyday by taking the advantage of the POS data. On the other hand, the CPINow-S index is a monthly series designed to closely match the official CPI by selecting the same representative items and by using the same index formula. The CPINow-S index has been available since January 2015.

The summary statistics of the two CPINow series are provided in Table 8.^[14] Table 9 shows the correlations between the two CPINow series and the official CPI at a monthly frequency. Here, the daily series of the CPINow-T index series is transformed by using the monthly average. In the table, a remarkably high correlation stands out between the CPINow-S index and the official CPI for all items less fresh food and energy. This result suggests that the CPINow series can serve as a good proxy for the official CPI at a higher frequency. Since the CPINow-T index is released two days after actual transactions, it is also useful for nowcasting purposes. A direct comparison between CPINow-T index and official CPI inflation series is provided in Figure 4.

Table 8:

Summary statistics of CPINow.

	T-index	S-index
Mean	0.27	1.16
S.D.	1.57	0.81
Median	−0.51	0.81
Min	−5.54	0.15
Max	8.04	3.15
Obs	10,380	36

Note: The sample period is April 1, 1989 to December 31, 2017 for the daily CPINow T-index and is January 2015 to December 2017 for the monthly CPINow S-index.

Table 9:

Correlations of CPINow and official consumer price index (CPI) inflation.

	All items	All items, less fresh food	All items, less fresh food and energy
CPINow-T index	0.649	0.602	-
CPINow-S index	0.591	0.571	0.946

Note: The sample period is April 1989 to December 2017 and January 2015 to December 2017 for the monthly CPINow T-index and CPINow S-index, respectively. The correlation between the CPINow-T index and the CPI inflation for all items, less fresh food and energy, is missing because the CPINow-T index adjusts the effects of the introduction of the consumption tax, while the corresponding adjusted series is not available for the CPI for all items, less fresh food and energy.

Figure 4:

Inflation based on the CPINow-T index and official consumer price index (CPI).

In what follows, we transform the CPINow-T index to construct 1- to 12-month inflation at an annual rate, which is the target variable in our forecasting analysis. Following the previous studies, including Atkeson and Ohanian (2001) and Stock and Watson (1999, 2009), our target variable is m-period inflation defined by

(3) π t m = 1 m ∑ j = 0 m − 1 π t − j

where m is a window size and π t is the CPINow-T index series in the form of the daily inflation rate at an annual rate. For the forecast horizon h, we consider approximately one month to one year by setting h = 30 × k for k = 1, 2, … , 12.

The h-period ahead forecast is constructed from the Phillips curve model given by

(4) π t + h h = α + β N L I t + ϕ h ( L ) π t + e t + h ,

where NLI _t is our news-based leading indicator (2) designed to capture the future economic conditions, ϕ h ( L ) = ∑ j = 1 h ϕ j L j − 1 and e _t + h is the forecast error. It should be noted that if β = 0 , (4) reduces to an AR(h) forecast with no restriction on the AR coefficients. In our exercise, we consider an AR model as a benchmark model and investigate whether the Phillips curve model combined with the news-based leading indicator can outperform the benchmark model. However, if no restriction is imposed on AR coefficients, the number of unknown parameters in the AR(h) model can be as large as 360. To avoid the issue of overfitting with too many parameters in the AR model, we consider several parsimonious specifications.

First, as in Atkeson and Ohanian (2001), a naive forecast of h-period inflation can be constructed using the current value of h-period inflation or ϕ h ( L ) π t = π t h = ( 1 / h ) ∑ j = 0 h − 1 π t − j . It should be noted that this specification can be obtained as the h-period moving average of the random walk forecast and only α needs to be estimated. Since it imposes a non-stationary (NS) restriction on AR coefficients, we refer this specification to AR-NS. Second, we can combine several values of m-period inflation with different window size m and use

ϕ h ( L ) π t = ϕ 1 π t + ϕ 7 π t 7 + ∑ k = 1 h / 30 ϕ 30 k π t 30 k .

For example, in this specification, a one-month ahead inflation forecast is computed by combining daily inflation, weekly inflation and monthly inflation. A similar parsimonious specification has also been employed by Ito and Yabu (2007) and Fatum and Hutchison (2010) in their government intervention analysis of daily foreign exchange rates, and by Corsi (2009) in his forecasting analysis of realized volatilities.^[15] Since the regressors are inflation series in mixed-frequencies (MF), we refer this specification to AR-MF. Third, the lag length can be selected using information criteria such as AIC and BIC and we refer these specifications to AR-AIC and AR-BIC, respectively. Using the simulated out-of-sample forecasting methodology explained below, we select a benchmark AR forecast from these alternative specifications of AR coefficients (namely, AR(h), AR-NS, AR-MF, AR-AIC, and AR-BIC).

3.2 Simulated out-of-sample forecasting

We use the simulated out-of-sample forecasting methodology in evaluating the forecasting model. In this approach, out-of-sample forecasts are computed as if a real-time forecaster were estimating the model using only the data available at the time of the past forecast. In particular, by using the sample only through the period t, the h-period ahead forecast of inflation, π t + h | t h is obtained. We then compare the forecast value π t + h | t h with a realized value π t + h h to compute the forecast error at the period t + h, namely, e ˆ t + h . Next, we follow the same procedure by using the sample through the period t + 1, estimate the model and compute e ˆ t + h + 1 . We repeat this process P − h + 1 times to obtain P − h + 1 forecast errors.^[16] Then, the MSE of h-period ahead inflation forecast, defined as σ 2 = E ( e t + h 2 ) , can be estimated by σ ˆ 2 = ( P − h + 1 ) − 1 ∑ t = R P + R − h e ˆ t + h 2 where R is the sample size used to estimate the forecast model at the beginning of the forecast evaluation.

We consider two estimation schemes to evaluate the forecasting performance. The first is the rolling scheme where the model is estimated using a moving data window of the length R. The second is the recursive scheme where the data with an increasing number of observations is used each time the new model is estimated. In the recursive scheme, R represents the sample size used in the initial step. For the sake of robustness, we conduct simulated out-of-sample forecasting experiments with P/R = 0.4 and 1.0. The exact numbers of P and R are selected using the identity P + R = T − h + 1 where T, the full sample size, is 10,413.

We are interested in determining if the MSE of the Phillips curve model is less than the benchmark AR models without using the news-based leading indicator. In the first step, we investigate an appropriate benchmark AR model by estimating the MSE of inflation forecast using the simulated out-of-sample forecasting methodology described above. The results of the MSE estimates for various forecast horizons h are summarized in Table 10. The performance of AR-NS is uniformly worse than that of an unrestricted AR(h) model. The AR-MF specification performs almost the same as an unrestricted AR(h) model. Both AR-AIC and AR-BIC perform well in shorter horizons but not in longer horizons. On balance, we select AR-MF as our preferred specification for the benchmark AR model. Once the benchmark model is selected, we can estimate the Phillips curve model (4) in the next step. For the case when the full sample period from April 1, 1989 to December 31, 2017 is used, estimated coefficients are provided in Table 11. This table demonstrates that when the forecast horizon h becomes 120 or longer, coefficients on the news-based leading indicator (β) become positive and significant. The table also reports the estimates of the sum of AR coefficients, ϕ h ( 1 ) = ∑ j = 1 h ϕ j . The sum of AR coefficients tends to be decreasing with the forecast horizon h, suggesting the stationarity of inflation in Japan. In summary, for four-month to one-year horizons, signs of the coefficients turn out to be consistent with the notion of the Phillips curve.

Table 10:

Mean squared errors (MSEs) of autoregressive (AR) forecasts.

(a) P/R = 0.4

	AR(h)	AR-NS	AR-MF	AR-AIC	AR-BIC
h = 30	0.06	0.08	0.06	0.06	0.06
h = 60	0.09	0.12	0.09	0.07	0.09
h = 90	0.11	0.16	0.11	0.09	0.10
h = 120	0.12	0.23	0.12	0.11	0.11
h = 150	0.13	0.34	0.13	0.13	0.13
h = 180	0.15	0.48	0.15	0.15	0.15
h = 210	0.17	0.64	0.17	0.19	0.18
h = 240	0.19	0.83	0.19	0.22	0.20
h = 270	0.21	1.02	0.21	0.24	0.22
h = 300	0.23	1.21	0.23	0.27	0.24
h = 330	0.26	1.39	0.25	0.29	0.29
h = 360	0.27	1.52	0.27	0.32	0.33

(b) P/R = 1.0

	AR(h)	AR-NS	AR-MF	AR-AIC	AR-BIC
h = 30	0.10	0.12	0.10	0.07	0.10
h = 60	0.16	0.24	0.16	0.11	0.16
h = 90	0.23	0.39	0.23	0.15	0.19
h = 120	0.25	0.58	0.26	0.20	0.24
h = 150	0.28	0.80	0.28	0.25	0.28
h = 180	0.33	1.02	0.32	0.30	0.33
h = 210	0.39	1.23	0.38	0.36	0.39
h = 240	0.44	1.41	0.43	0.41	0.45
h = 270	0.49	1.56	0.48	0.46	0.50
h = 300	0.54	1.66	0.52	0.51	0.55
h = 330	0.58	1.73	0.56	0.55	0.62
h = 360	0.61	1.77	0.59	0.59	0.66

Note: MSEs are estimated by the rolling scheme.

Table 11:

Full sample coefficient estimates of the Phillips curve model.

	β	ϕ h ( 1 )
h = 30	−1.83	0.97***
h = 60	0.40	0.95***
h = 90	3.19	0.92***
h = 120	5.40*	0.87***
h = 150	8.48**	0.83***
h = 180	11.00***	0.78***
h = 210	13.83***	0.75***
h = 240	16.66***	0.72***
h = 270	19.56***	0.70***
h = 300	22.35***	0.68***
h = 330	26.23***	0.66***
h = 360	31.11***	0.65***

Note: For HAC standard errors, we follow Schwert (1989) and set the lag length by the integer part of 4 × [(T − h + 1)/100]^1/4. The sample period is from April 1, 1989 to December 31, 2017. ***, **, * denotes that coefficients are significantly different from zero at the 1, 5, and 10% significance levels, respectively.

The benchmark AR model is clearly nested by the Phillips curve model (4). Since the forecasting model of our interest nests the benchmark model, we use the out-of-sample F type test statistic proposed in McCracken (2007), which is designed to compare the forecasting performance of nested models. The out-of-sample F type test statistic is defined as

(5) F = ( P − h + 1 ) × ( σ ˆ A R 2 σ ˆ P C 2 − 1 )

where σ ˆ A R 2 is the estimator of the MSE of the AR inflation forecast σ A R 2 and σ ˆ P C 2 is the estimator of the MSE of the Phillips curve inflation forecast σ P C 2 .

We use this statistic to test for the null hypothesis of H ₀: σ A R 2 / σ P C 2 = 1 against an alternative hypothesis of H ₁: σ A R 2 / σ P C 2 > 1 . McCracken (2007) shows that when h = 1, the F statistic asymptotically follows non-standard distribution under the null hypothesis. He provides the critical values which depend on P/R and the difference in the number of regressors. In general, for the case of a longer forecast horizon h > 1, the distribution becomes data dependent. However, for the case when the number of additional regressors is exactly one, as in our setting, the critical values in McCracken (2007) can still be valid (see West (2006) for this point more in detail). We compute the relative size of MSEs of the benchmark AR model and the Phillips curve model, then conduct the out-of-sample F-test for various h.

Table 12 shows the relative MSE given by σ ˆ A R 2 / σ ˆ P C 2 with out-of-sample F-test statistics in parenthesis. Note that the sample period in the analysis depends on forecast horizon (h). For example, the sampling period for estimation in the first step of the rolling scheme with P/R = 1 for h = 360 is from June 29, 1989 to February 19, 2010. In this case, both R and P are 2873.

Table 12:

Relative MSEs of the AR forecast and the Phillips curve forecast.

	Rolling				Recursive
	P/R = 0.4		P/R = 1.0		P/R = 0.4		P/R = 1.0
h = 30	0.98	(−73.31)	1.00	(−6.26)	0.99	(−21.01)	1.00	(−10.80)
h = 60	0.98	(−61.06)	1.00	(−11.22)	0.99	(−19.69)	1.00	(−11.26)
h = 90	1.00	(−2.55)	1.00	(−1.83)	1.00***	(5.81)	1.00**	(2.54)
h = 120	1.03***	(84.80)	1.00***	(23.92)	1.01***	(32.30)	1.00***	(14.53)
h = 150	1.08***	(221.33)	1.01***	(70.25)	1.03***	(82.74)	1.01***	(38.33)
h = 180	1.11***	(330.69)	1.02***	(118.26)	1.04***	(124.82)	1.01***	(55.18)
h = 210	1.14***	(405.31)	1.03***	(145.50)	1.06***	(164.94)	1.01***	(73.48)
h = 240	1.16***	(470.01)	1.03***	(169.24)	1.07***	(212.61)	1.02***	(94.23)
h = 270	1.18***	(517.14)	1.04***	(185.38)	1.09***	(258.21)	1.02***	(113.85)
h = 300	1.19***	(535.20)	1.04***	(190.70)	1.10***	(290.08)	1.03***	(128.43)
h = 330	1.19***	(535.40)	1.04***	(189.35)	1.12***	(323.91)	1.03***	(156.83)
h = 360	1.20***	(554.78)	1.04***	(189.20)	1.15***	(421.91)	1.04***	(210.43)

Note: The MSE ratio given by σ ˆ A R 2 / σ ˆ P C 2 percent where σ ˆ A R 2 is the estimator of the MSE of the AR forecast and σ ˆ P C 2 is the estimator of the MSE of the Phillips curve inflation forecast. The numbers in parentheses are out-of-sample F-test statistics of McCracken (2007). ***, **, * denotes the rejection of the null hypothesis of equal predictability (zero percent deviation of MSEs) using a one-tailed test at the 1, 5, and 10% significance levels, respectively.

Based on the point estimates of MSEs, the Phillips curve forecast performs better than the AR forecast when h is 120 and longer for the rolling scheme and when h is 90 and longer for the recursive scheme. Furthermore, the results of the out-of-sample F-test imply that when the forecasting horizon becomes longer, the estimate of σ P C 2 becomes significantly less than that of σ A R 2 . These results confirm that the Phillips curve forecast outperforms the AR forecast at least for horizons greater than three months. The cumulative squared prediction error differences of two models for h = 120, 240 and 360 are also shown in Figure 5. This figure implies that the improvements are clearly observed in both in the beginning and after 2016 in the evaluation period.^[17] The significant reduction of the MSE does not change among the out-of-sample simulation designs with P/R = 0.4 and 1.0. Our finding that the news-based leading indicator contains valuable information about future inflation is consistent with the fact that the assessment of future economic conditions refers to the 2–3 months ahead projected status of the Japanese economy in the Economy Watchers Survey.

Figure 5:

Cumulative squared prediction error differences between the AR and the Phillips curve forecasts.

Note: Mean squared errors (MSEs) are estimated by the rolling scheme and P/R = 0.4.

4 Discussions

Our results on inflation forecast improvements by using the Phillips curve model at a daily frequency cast new light on the literature on inflation dynamics in Japan, since no previous studies have used a daily inflation series in estimating the Phillips curve.^[18] However, it is still of interest to compare our main result with previous studies on the inflation forecast in Japan. As pointed out by Fukuda and Keida (2001), the Phillips curve in Japan has not been considered very useful in inflation forecasting compared to the US case. As shown in Figure 4, Japan has experienced a long-lasting deflation period since the second half of the 1990s. Unlike the status during the inflation periods of the 1970s and 1980s, the Phillips curve relationship is believed to be weakened during the period of deflation. This weak relationship between inflation and real economic activity has also stood out at the time of global financial crisis because the sharp drop in output gap and the sharp rise in the unemployment did not result in severe deflation. To allow for the possibility of changing coefficients in the Phillips curve, Nishizaki, Sekine, and Ueno (2014) estimated the time-varying parameter model of inflation. It should be noted that, in our main analysis in the previous section, the sample period includes both periods of declining inflation until 1995 and of deflation. Our sample period also contains the time of the global financial crisis.

In response to prolonged deflation, the Bank of Japan (BOJ) introduced the price stability target of a two percent annual inflation rate in January 2013 and has been conducting a series of unconventional monetary easing policies, including the quantitative and qualitative easing (QQE) policy of April 2013. Using micro price data, Watanabe and Watanabe (2018) investigated how items whose price remained unchanged contributed to inflation dynamics in Japan under the monetary easing policy since April 2013. They discovered that items with flexible prices contributed to raising the inflation in 2014 while items with sticky prices did not.

To incorporate the different phases of the Japanese economy described above and possible parameter shifts in the Phillips curve, we examine the robustness of our main results by repeating the same out-of-sample forecasting exercise using the following subsamples. The first subsample is from January 1, 1996 to December 31, 2017, so that the declining inflation phase until 1995 is removed from the full sample. The second subsample we consider is the post-global financial crisis era from September 1, 2008 to December 31, 2017. The third subsample is from April 1, 2013 to December 31, 2017, which corresponds to the period when the BOJ conducted the unconventional monetary easing policy. Note that all the subsamples end in 2017, mainly because our training data extends from 2011 to 2018.

Tables 13 shows the resulting relative MSEs from the subsample analysis. Even if the declining inflation period is removed, the results are very similar to the full sample case. It is interesting to note that for the post-crisis subsample, significant reductions of MSE are observed for all forecast horizons, including one to three months. In contrast, much weaker evidence is obtained for the subsample of the unconventional monetary easing policy period. However, even in this case, a significant reduction is still obtained when the horizon approaches around one year.

Table 13:

Relative MSEs of the AR forecast and the Phillips curve forecast: subsamples.

	(1) 1996–2017		(2) 2008–2017		(3) 2013–2017
h = 30	0.97	(−68.34)	1.06***	(57.10)	0.98	(−7.94)
h = 60	0.99	(−23.50)	1.12***	(115.79)	0.99	(−6.33)
h = 90	1.02***	(42.13)	1.16***	(148.83)	0.98	(−3.84)
h = 120	1.06***	(127.63)	1.17***	(164.20)	0.99	(−5.97)
h = 150	1.12***	(275.87)	1.19***	(181.40)	0.99	(−10.31)
h = 180	1.18***	(400.54)	1.19***	(179.89)	0.98	(−12.73)
h = 210	1.20***	(443.32)	1.20***	(182.99)	0.97	(−14.84)
h = 240	1.21***	(463.65)	1.22***	(197.02)	0.99	(−4.60)
h = 270	1.23***	(499.31)	1.21***	(186.43)	0.99	(−2.26)
h = 300	1.24***	(520.51)	1.19***	(172.12)	1.00*	(0.97)
h = 330	1.25***	(541.20)	1.19***	(165.57)	1.01**	(2.22)
h = 360	1.26***	(572.59)	1.18***	(160.83)	1.00**	(1.79)

Note: MSEs are estimated by the rolling scheme and P/R = 0.4. See also the note for Table 12.

In the previous subsection, a univariate AR model with some parameter restriction is selected as a benchmark model to evaluate the performance of the Phillips curve forecast. Let us now turn to a different way of checking the robustness of our main results by introducing alternative benchmark models. In particular, we consider whether our news-based leading indicator is still useful even when the benchmark univariate AR model is replaced by a vector autoregressive (VAR) model with an additional variable x _t. In this case, the Phillips curve model (4) is replaced by its extended version given by

(6) π t + h h = α + β N L I t + ϕ h ( L ) π t + φ h ( L ) x t + e t + h ,

where NLI _t is the news leading indicator, φ h ( L ) = ∑ j = 1 h φ j L j − 1 , and x _t is some daily variable possibly representing daily market information. It should be noted that (6) reduces to a VAR(h) forecast if NLI _t has no predictive power or β = 0 . As in the benchmark AR model in the main results, we use a parsimonious AR-MF specification for ϕ h ( L ) . Since (6) is no longer in the form of the standard Phillips curve model, in what follows we simply refer it to the extended Phillips curve model.

For the choice of the additional variable x _t, we consider (i) the daily percentage change for the Nikkei Stock Average, (ii) the daily percentage change for the dollar to yen exchange rate, (iii) the daily percentage change for West Texas Intermediate (WTI) crude oil spot prices, and (iv) the daily series of spread between 1 and 10 year Japanese government bonds (JGBs). In addition, we also consider the first to third principal components of all four series [(i)–(iv)] as predictor variables. This approach can be viewed as a type of factor augmented AR forecast similar to the one employed in Shintani (2005), who claimed that the common factor computed from the principal component analysis is useful in improving monthly inflation forecasts in Japan (see also Stock and Watson 2002). Finally, we use a parsimonious lag specification of the additional variable simply by imposing the h-period moving average restriction φ h ( L ) x t = φ x t h where x t h = ( 1 / h ) ∑ j = 0 h − 1 x t − j .

Tables 14 and 15 show the results of this additional analysis. Table 14 reports the estimated MSEs of the benchmark VAR model analogues to Table 10. Table 15 shows the relative MSE given by σ ˆ V A R 2 / σ ˆ E P C 2 where σ ˆ V A R 2 is the estimator of the MSE of the VAR inflation forecast and σ ˆ E P C 2 is the estimator of the MSE of the extended Phillips curve inflation forecast, with out-of-sample F-test statistics in parenthesis. Note that, since the difference in the number of regressors between the VAR model and the extended Phillips curve model is still one, we can use the critical values from McCracken (2007).

Table 14:

MSEs of vector autoregressive (VAR) forecasts.

	Stock price	FX	WTI	Term spread	Factor (1)	Factor (2)	Factor (3)
h = 30	0.06	0.06	0.06	0.06	0.06	0.06	0.06
h = 60	0.09	0.09	0.09	0.09	0.09	0.09	0.09
h = 90	0.12	0.12	0.11	0.11	0.11	0.11	0.11
h = 120	0.12	0.12	0.11	0.11	0.12	0.12	0.11
h = 150	0.13	0.12	0.10	0.10	0.12	0.13	0.11
h = 180	0.13	0.13	0.11	0.11	0.14	0.14	0.13
h = 210	0.15	0.15	0.12	0.12	0.16	0.16	0.18
h = 240	0.17	0.17	0.13	0.13	0.18	0.18	0.23
h = 270	0.19	0.29	0.14	0.14	0.21	0.19	0.26
h = 300	0.22	0.22	0.16	0.16	0.23	0.23	0.23
h = 330	0.24	0.24	0.17	0.17	0.25	0.24	0.23
h = 360	0.26	0.26	0.18	0.18	0.25	0.23	0.23

Note: MSEs are estimated by the rolling scheme and P/R = 0.4. Factor(K) implies factor forecast based on K number of factors.

Table 15:

Comparisons of MSEs of the VAR forecast and the generalized Phillips curve forecast.

	Stock price		FX		WTI		Term spread		Factor (1)		Factor (2)		Factor (3)
h = 30	0.99	(−41.80)	0.98	(−71.59)	0.99	(−74.22)	0.98	(−33.24)	0.98	(−46.56)	0.99	(−46.01)	0.99	(−29.24)
h = 60	1.00	(0.46)	0.98	(−57.56)	0.98	(−71.19)	0.99	(−43.80)	0.99	(−21.71)	0.99	(−24.47)	0.99	(−29.94)
h = 90	1.01***	(42.43)	1.00	(−10.12)	0.99	(−31.06)	0.99	(−28.63)	1.00***	(10.49)	0.99	(3.56)	0.99	(−31.68)
h = 120	1.02***	(52.94)	1.02***	(57.51)	1.00***	(8.44)	1.00	(−9.37)	1.01***	(37.10)	1.00	(−3.26)	0.99	(−54.61)
h = 150	1.02***	(44.0)	1.06***	(175.62)	1.03***	(93.94)	1.00***	(9.73)	1.04***	(107.17)	0.99	(−36.11)	0.97	(−84.91)
h = 180	1.02***	(71.62)	1.09***	(274.28)	1.05***	(150.08)	1.01***	(23.38)	1.03***	(78.32)	0.98	(−50.11)	0.97	(−83.90)
h = 210	1.04***	(128.80)	1.12***	(346.22)	1.07***	(197.40)	1.02***	(43.48)	1.05***	(122.28)	0.99	(−36.11)	0.99	(−40.11)
h = 240	1.08***	(223.71)	1.14***	(415.21)	1.08***	(230.65)	1.02***	(69.84)	1.09***	(242.83)	1.02***	(53.57)	1.01***	(14.37)
h = 270	1.11***	(323.23)	1.16***	(452.71)	1.10***	(281.24)	1.03***	(94.53)	1.10***	(245.04)	1.06***	(156.07)	1.02***	(62.66)
h = 300	1.14***	(397.77)	1.16***	(455.77)	1.11***	(324.73)	1.04***	(113.49)	1.09***	(227.69)	1.07***	(165.65)	1.03***	(72.06)
h = 330	1.15***	(416.26)	1.16***	(443.78)	1.13***	(377.88)	1.05***	(127.55)	1.10***	(225.62)	1.06***	(147.98)	1.02***	(54.21)
h = 360	1.14***	(389.83)	1.16***	(446.40)	1.16***	(440.85)	1.05***	(147.69)	1.10***	(214.97)	1.08***	(171.46)	1.02***	(39.27)

Note: Factor(K) implies factor forecast based on K number of factors. The ratio given by σ ˆ V A R 2 / σ ˆ E P C 2 where σ ˆ V A R 2 is the estimator of the MSE of the VAR forecast and σ ˆ E P C 2 is the estimator of the MSE of the extended Phillips curve inflation forecast. MSEs are estimated by the rolling scheme and P/R = 0.4. See also note for Table 12.

The results indicate that the addition of the news-based leading indicator can improve forecasting accuracy even if the benchmark univariate AR model is replaced by the VAR model. This fact suggests that our news-based leading indicator contains information for future inflation that is not included in the additional daily market data.

5 Conclusion

We constructed the news-based business cycle index from the daily newspaper articles and examined its informational content in predicting the future inflation in Japan. Our analysis suggested that our news-based business cycle index captures daily changes in real economic activity and our index is highly correlated with other business cycle indicators in lower frequency. We found that the news-based leading indicator, rather than the news-based coincident indicator, is useful in forecasting the inflation rate in Japan for horizons longer than three months. Our finding that the news-based leading indicator contains valuable information about future inflation is consistent with the fact that the assessment of future conditions refers to the projected three-months ahead economic conditions in the Economy Watchers Survey.

Corresponding author: Mototsugu Shintani, The University of Tokyo, Tokyo, Japan; and Bank of Japan, Tokyo, Japan, E-mail: shintani@e.u-tokyo.ac.jp

Funding source: Scientific Research

Award Identifier / Grant number: 17H02510/20H01482

Acknowledgments

The authors would like to thank Shigenori Shiratsuka, Kumiko Tanaka, Tomohiro Tsuruga, Kozo Ueda and seminar and conference participants at the University of Tokyo, the 27th Annual Symposium of the Society for Nonlinear Dynamics and Econometrics in Dallas, the 2019 Summer Workshop on Economic Theory in Otaru, and the 2019 CUR Microeconometrics Conference on Survey Methodology and Data Science at Bank of Canada for useful comments and suggestions. Shintani greatly acknowledges the financial support of RCAST at the University of Tokyo, Grant-in-aid for Scientific Research 17H02510/20H01482 and the Joint Usage and Research Center Programs of IER at Hitotsubashi University. The views expressed in this paper are those of the authors and do not necessarily reflect the official views of any institutions.

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: Grant-in-aid for Scientific Research 17H02510/20H01482.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix: Text analysis using fastText

In this appendix, we describe details on our employed text classification model and its setting in our analysis. This text classification model is known for an algorithm in the fastText library from Facebook. However, the official fastText library deals with only a task for classification, not for regression. Thus, we reimplemented the model using Keras, which is a python library, instead of utilizing the official library.^[19] According to Joulin et al. (2017), fastText is a simple and computationally efficient network architecture, and at the same time, it performs on a par with classifiers based on famous neural network models, such as the CNN and the LSTM in terms of accuracy.

In fastText, there are only three layers: (i) the embedding layer; (ii) the average pooling layer; and (iii) the output layer. Sentences that are represented as N-gram features are converted into fixed length vectors within an embedding layer and an average pooling layer. In the embedding layer, each word is converted into a fixed length vector. Let x _w be a V × 1 one-hot vector, where V is the vocabulary size.^[20] For the vocabulary, we used all types of words (unigram features) and their bigram features in a training data set. According to Joulin et al. (2017), including the bigram as input features, in addition to the unigram, improves the accuracy of the model.

Embedding layer

The operation of the embedding layer is defined as

(A.1) x ¯ w = E x w ,

where E is a d × V embedding matrix and x ¯ w is a d × 1 embedded vector. The embedding layer turns one-hot vectors into dense vectors of fixed size. Here, since the number of words differs among sentences, we pad all sentences to the same sequence length. This length is selected from the longest sequence in the training data set. Figure A1 shows an example of the operation in the embedding layer of how 10-dimensional one-hot vectors x _w’s are converted into five-dimensional embedded vectors x ¯ w ’s. To be more specific, the embedding operation applied to a sentence consisting of seven words is shown as ( x ¯ w 1 , x ¯ w 2 , … , x ¯ w 7 ) ′ = ( x w 1 , x w 2 , … , x w 7 ) ′ E ′ in the figure. For illustrative purposes, original Japanese words in parentheses are translated to English words. Padding is represented by a special character <PAD> in the figure.

Figure A1:

An example of the operation in the embedding layer.

Here, d is a hyperparameter that defines the size of the output vectors from the embedding layer for each word. For example, setting d = 100 implies that we map each word vector into 100-dimensional vector. Generally, neural network models become more computationally efficient when d becomes smaller. However, if d is too small, embedded vectors are not enough to represent features of words and the model may not perform well in term of accuracy. For the text classification task, Joulin et al. (2017) reported that the model will be effective with d = 10. Following their suggestion, we also set d = 10.

Average pooling layer

The word representations are then averaged into a text representation in the average pooling layer. The average-pooling is an operation that returns the average of input tensors of rank optional n. In case of fastText, it computes the average of word vectors outputted from the embedding layer. In other words, fastText maps each sentence into fixed length vectors using the average of words vectors in each sentence. In comparison with the CNN and the LSTM, fastText achieves high performance in text classification without using word order information. The average pooling operation is defined as

(A2) x s = 1 L ∑ w ∈ s x ¯ w ,

where x _s is a d × 1 averaged vector. Figure A2 depicts an example of the operation in the average pooling layer showing how five-dimensional word embedded vectors x ¯ w ’s are converted into a five-dimensional sentence embedded vector x _s. In (A2), the window size is fixed at the length of the longest sequence in the training data set, namely, L.

Figure A2:

An example of the operation in the average-pooling layer.

Output layer

Finally, text representations feed to the output layer. The output layer is given by

(A3) y = ϕ ( b + w x s ) ,

where y is a scalar output, w is a 1 × d weight vector, b is a scalar bias, and ϕ is an activation function (in general, y can be an n × k output vector where n is the number of outputs and k is the number of classes). The choice of the activation function depends on the problem. For example, a linear function is used for the regression problem while a softmax function is used for the multi-class classification problem. As an activation function in an output layer, we use a linear function for estimating sentiment scores and a sigmoid function for binary-classifying topics of assessments of the current and future economic conditions. We used Adam for optimization with a learning rate of 0.001, a mini-batch size of 100, and 200 epochs.^[21] We used a random uniform initializer for the embedding layer and the Glorot normal initializer for the output layer.^[22]

The training data is randomly shuffled at each epoch. We divided the training data set into two parts: 90% for training and 10% for validation. We select the weights of the models to minimize the MSE for regression and the cross entropy for classification on a validation data set out of all epochs. In our setting, two loss functions, the MSE (L _mse) and the cross entropy (L _ce), are defined as

L m s e = 1 N ∑ n ⊆ N ( y n − t n ) 2 , L c e = − 1 N ∑ n ⊆ N ( t n ln y n + ( 1 − t n ) ln ( 1 − y n ) ) ,

where N is a mini-batch size, t _n is a target variable and y _n is a predictor variable.

References

Abe, N., and A. Tonogi. 2010. “Micro and Macro Price Dynamics in Daily Data.” Journal of Monetary Economics 57 (6): 716–28. https://doi.org/10.1016/j.jmoneco.2010.05.016.Search in Google Scholar

Akerlof, G. A. 2002. “Behavioral Macroeconomics and Macroeconomic Behavior.” American Economic Review 92 (3): 411–33. https://doi.org/10.1257/00028280260136192.Search in Google Scholar

Atkeson, A., and L. E. Ohanian. 2001. “Are Phillips Curves Useful for Forecasting Inflation?” Federal Reserve Bank of Minneapolis Quarterly Review 25 (1): 2–11. https://doi.org/10.21034/qr.2511.Search in Google Scholar

Baker, S. R., N. Bloom, and S. J. Davis. 2016. “Measuring Economic Policy Uncertainty.” Quarterly Journal of Economics 131 (4): 1593–636. https://doi.org/10.1093/qje/qjw024.Search in Google Scholar

Bollen, J., H. Mao, and X. J. Zeng. 2011. “Twitter Mood Predicts the Stock Market.” Journal of Computational Science 2 (1): 1–8. https://doi.org/10.1016/j.jocs.2010.12.007.Search in Google Scholar

Cavallo, A., and R. Rigobon. 2016. “The Billion Prices Project: Using Online Prices for Measurement and Research.” Journal of Economic Perspectives 30 (2): 151–78. https://doi.org/10.1257/jep.30.2.151.Search in Google Scholar

Chung, J., C. Gulcehre, K. Cho, and Y. Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555. Working paper.Search in Google Scholar

Conneau, A., H. Schwenk, L. Barrault, and Y. LeCun. 2016. “Very Deep Convolutional Networks for Natural Language Processing.” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 1107–16.10.18653/v1/E17-1104Search in Google Scholar

Corsi, F. 2009. “A Simple Approximate Long-Memory Model of Realized Volatility.” Journal of Financial Econometrics 7 (2): 174–96. https://doi.org/10.1093/jjfinec/nbp001.Search in Google Scholar

Fatum, R., and M. M. Hutchison. 2010. “Evaluating Foreign Exchange Market Intervention: Self-Selection, Counterfactuals and Average Treatment Effects.” Journal of International Money and Finance 29 (3): 570–84. https://doi.org/10.1016/j.jimonfin.2009.12.009.Search in Google Scholar

Faust, J., and J. H. Wright. 2013. “Forecasting Inflation.” In Handbook of Economic Forecasting, Vol. 2, edited by G. Elliott, and A. Timmermann, 2–56. Amsterdam: North Holland.10.1016/B978-0-444-53683-9.00001-3Search in Google Scholar

Fukuda, S., and M. Keida. 2001. “Prospects for Empirical Analysis on Inflation Forecasts: The Predictive Power of Phillips Curves in Japan.” BOJ Research and Statistics Department Working Paper, 01–21. Bank of Japan (In Japanese).Search in Google Scholar

Gentzkow, M., and J. Shapiro. 2010. “What Drives Media Slant? Evidence from U.S. Daily Newspapers.” Econometrica 78 (1): 35–71. https://doi.org/10.3982/ecta7195.Search in Google Scholar

Glorot, X., and Y. Bengio. 2010. “Understanding the Difficulty of Training Deep Feedforward Neural Networks.” In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 249–56.Search in Google Scholar

Hansen, S., and M. McMahon. 2016. “Shocking Language: Understanding the Macroeconomic Effects of Central Bank Communication.” Journal of International Economics 99 (1): S114–33. https://doi.org/10.1016/j.jinteco.2015.12.008.Search in Google Scholar

Hansen, S., M. McMahon, and A. Prat. 2018. “Transparency and Deliberation within the FOMC: A Computational Linguistics Approach.” Quarterly Journal of Economics 133 (2): 801–70. https://doi.org/10.1093/qje/qjx045.Search in Google Scholar

Hochreiter, S., and J. Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.Search in Google Scholar PubMed

Ito, T., and T. Yabu. 2007. “What Prompts Japan to Intervene in the Forex Market? A New Approach to a Reaction Function.” Journal of International Money and Finance 26 (2): 193–212. 10.1016/j.jimonfin.2006.12.001.10.3386/w10456Search in Google Scholar

Joulin, A., E. Grave, P. Bojanowski, and T. Mikolov. 2017. “Bag of Tricks for Efficient Text Classification.” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 427–31.10.18653/v1/E17-2068Search in Google Scholar

Kim, Y. 2014. “Convolutional Neural Networks for Sentence Classification.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1746–51.10.3115/v1/D14-1181Search in Google Scholar

Kingma, D. P., and J. Ba. 2015. “Adam: A Method for Stochastic Optimization.” In Proceedings of the 3rd International Conference on Learning Representations.Search in Google Scholar

Larsen, V. H., and L. A. Thorsrud. 2018. Business Cycle Narratives. Norges Bank Working Paper 2018-03.10.2139/ssrn.3130108Search in Google Scholar

Lin, Z., M. Feng, C. N. Santos, M. Yu, B. Xiang, B. Zhou, and B. Bengio. 2017. “A Structured Self-Attentive Sentence Embedding.” In Proceedings of the 5th International Conference on Learning Representations.Search in Google Scholar

Loughran, T., and B. McDonald. 2011. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” The Journal of Finance 66 (1): 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x.Search in Google Scholar

McCracken, M. 2007. “Asymptotics for Out of Sample Tests of Granger Causality.” Journal of Econometrics 140 (2): 719–52. https://doi.org/10.1016/j.jeconom.2006.07.020.Search in Google Scholar

Nishizaki, K., T. Sekine, and Y. Ueno. 2014. “Chronic Deflation in Japan.” Asian Economic Policy Review 9: 20–39. https://doi.org/10.1111/aepr.12041.Search in Google Scholar

Okazaki, Y., and T. Tsuruga. 2015. “On Economic and General Price Analysis Using Big Data: Survey of Research Works and Text Analysis of the Economy Watchers Survey.” BOJ Reports & Research Papers, Bank of Japan (in Japanese).Search in Google Scholar

Schwert, G. W. 1989. “Tests for Unit Roots: A Monte Carlo Investigation.” Journal of Business & Economic Statistics 7: 147–59. https://doi.org/10.2307/1391432.Search in Google Scholar

Shapiro, A. H., M. Sudhof, and D. Wilson. 2018. Measuring News Sentiment. Federal Reserve Bank of San Francisco Working Paper 2017-01.10.24148/wp2019-02Search in Google Scholar

Shintani, M. 2005. “Nonlinear Forecasting Analysis Using Diffusion Indexes: an Application to Japan.” Journal of Money, Credit, and Banking 37 (3): 517–38. https://doi.org/10.1353/mcb.2005.0036.Search in Google Scholar

Shintani, M., T. Yabu, and D. Nagakura. 2012. “Spurious Regressions in Technical Trading.” Journal of Econometrics 169 (2): 301–9. https://doi.org/10.1016/j.jeconom.2012.01.019.Search in Google Scholar

Socher, R., A. Perelygin, J. Wu, J. Chuang, C. D. Manning, and A. Y. Ng. 2013. “Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank.” In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1631–42.Search in Google Scholar

Stock, J. H., and M. W. Watson. 1999. “Forecasting Inflation.” Journal of Monetary Economics 44: 293–305. https://doi.org/10.1016/s0304-3932(99)00027-6.Search in Google Scholar

Stock, J. H., and M. W. Watson. 2002. “Forecasting Using Principal Components from a Large Number of Predictors.” Journal of the American Statistical Association 97 (460): 1167–79. https://doi.org/10.1198/016214502388618960.Search in Google Scholar

Stock, J. H., and M. W. Watson. 2009. “Phillips Curve Inflation Forecasts.” In Understanding Inflation and the Implications for Monetary Policy, a Phillips Curve Retrospective, edited by J. Fuhrer, Y. K. Kodrzycki, J. S. Little, and G. P. Olivei, 99–184. Cambridge, MA: MIT Press.10.3386/w14322Search in Google Scholar

Suimon, Y., T. Kinoshita, and Y. Yamamoto. 2015. “Indexation of Business Outlook of Government and BOJ by Artificial Intelligence.” NOMURA Macroeconomic Insight (in Japanese).Search in Google Scholar

Tallman, E. W., and S. Zaman. 2017. “Forecasting Inflation: Phillips Curve Effects on Services Price Measures.” International Journal of Forecasting 33 (2): 442–57. https://doi.org/10.1016/j.ijforecast.2016.10.004.Search in Google Scholar

Tetlock, P. C. 2007. “Giving Content to Investor Sentiment: The Role of Media in the Stock Market.” The Journal of Finance 62 (3): 1139–68. https://doi.org/10.1111/j.1540-6261.2007.01232.x.Search in Google Scholar

Tetlock, P. C., M. Saar-Tsechansky, and S. Macskassy. 2008. “More Than Words: Quantifying Language to Measure Firms’ Fundamentals.” The Journal of Finance 63 (3): 1437–67. https://doi.org/10.1111/j.1540-6261.2008.01362.x.Search in Google Scholar

Thorsrud, L. A. 2018. “Words Are the New Numbers: A Newsy Coincident Index of Business Cycles.” Journal of Business & Economic Statistics 38 (2): 393–409.10.1080/07350015.2018.1506344Search in Google Scholar

Watanabe, K., and T. Watanabe. 2018. “Why Has Japan Failed to Escape from Deflation?” Asian Economic Policy Review 13: 23–41. https://doi.org/10.1111/aepr.12197.Search in Google Scholar

Welch, I., and A. Goyal. 2008. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies 21 (4): 1455–508. https://doi.org/10.1093/rfs/hhm014.Search in Google Scholar

West, K. D. 2006. “Forecasting Inflation.” In Handbook of Economic Forecasting, Vol. 1, 99–134. Amsterdam: North Holland.10.1016/S1574-0706(05)01003-7Search in Google Scholar

Xiao, Y., and K. Cho. 2016. Efficient Character-Level Document Classification by Combining Convolution and Recurrent Layers. arXiv:1602.00367. Working paper.Search in Google Scholar

Zhang, X., J. Zhao, and Y. LeCun. 2015. “Character-level Convolutional Networks for Text Classification.” In Proceedings of the 28th International Conference on Neural Information Processing Systems, 649–57.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/snde-2019-0117).

Received: 2019-09-27

Accepted: 2020-08-04

Published Online: 2020-09-18

Forecasting Japanese inflation with a news-based leading indicator of economic activities

Abstract

1 Introduction

2 Building news-based business cycle indexes

2.1 Text data: the Nikkei and the Economy Watchers Survey

2.2 Text classification model

2.3 Estimation and aggregation

3 Forecasting performance of news-based leading indicator

3.1 Phillips curve inflation forecast

3.2 Simulated out-of-sample forecasting

4 Discussions

5 Conclusion

Acknowledgments

Appendix: Text analysis using fastText

Embedding layer

Average pooling layer

Output layer

References

Supplementary Material

Journal and Issue

Articles in the same Issue