A Robust Analysis and Forecasting Framework for the Indian Mid Cap Sector Using Times Series Decomposition Approach

Prediction of stock prices using econometrics and machine learning-based approaches poses significant challenges to the research community since the movements of stock prices are essentially random in its nature. However, significant development and rapid evolution of sophisticated and complex algorithms which are capable of analyzing large volume of time series data, coupled with availability of high-performance hardware and parallel computing architecture over the last decade, has made it possible to efficiently process and effectively analyze voluminous stock market time series data in an almost real-time environment. In this paper, we propose a decomposition-based approach for time series analysis of the Indian mid cap sector and also present a highly robust and accurate prediction framework consisting of six forecasting methods for predicting the future values of the time series. Extensive results are presented on the performance of each forecasting method and the reasons why a particular method has performed better than the others have been critically analyzed.


Introduction
Developing an accurate and efficient forecasting framework for robust prediction of stock prices has been one of the most exciting yet difficult challenges faced by the researchers working in the field of machine learning and analytics. Numerous technical, fundamental and statistical indicators have been proposed by researchers for accurately predicting the prices of stocks. Sen & Datta Chaudhuri (2016a;2016b;2016c;2016d) proposed a novel approach based on time series decomposition and analysis for efficient portfolio diversification and prediction of stock prices. The authors have hypothesized all sectors of an economy do not exhibit similar pattern of variations in their stock prices. In fact, it is more usual to find that different sectors exhibiting different patterns in their trend, different characteristics in their seasonality behavior, and varying degree of randomness in their time series values. While on one side the efficient market hypothesis has argued for the randomness aspect of stock price movements, on the other side, there are propositions to counter the hypothesis by delving into various fundamental characteristics of different sectors and different stocks in those sectors.
We argue that in addition to the differences in the fundamental characteristics among stocks belonging of different companies, performances of different stocks are also very much dependent and coupled to the sectors to which the stocks belong. Since behavior of each sector of economy is influenced by its unique set of factors, the pattern of price movement of stocks belonging to different sectors are also determined and influenced by these factors.
In this work, our goal is to study the behavioral pattern exhibited by the time series of the mid cap sector of India, so that the salient properties of that sector can be better understood. By its definition, a mid-cap company has a market capitalization between Indian Rupees (INR) 50 billion to INR 200 billion. For the purpose of our study, the monthly average index values of the mid cap sector are used for the period January 2010 -December 2016 as per the Bombay Stock Exchange (BSE). The monthly time series data is decomposed into its three components using functions defined in the R programming language. Based on the results of decomposition, we demonstrate how several interesting characteristics of the time series can be extracted to gain useful insights into its behavioral pattern. We particularly illustrate how a deeper analysis of the trend, seasonal and random components provide us with useful information about the growth pattern, seasonal properties and randomness exhibited by the time series index values. For predicting future behavioral pattern, we also propose an extensive framework for time series forecasting consisting of six methods of prediction of time series index values. The six forecasting methods are critically analyzed in terms of their accuracy in forecasting and why some methods have performed better than the others are also explained.
The organization of the paper is as follows. In Section 2, we present a detailed description of the methodology that we have followed in this work. We discuss in detail the method of decomposition of the mid cap sector time series into its various components. Section 3 presents an extensive results of decomposition of the time series values into its trend, seasonal and random components. The decomposition results are analyzed in depth in order to understand several important characteristics and behavior revealed by the time series. In Section 4, we propose a set of six forecasting methods for predicting the future values of the time series. Section 5 provides extensive results on the performance of the six forecasting methods on the mid cap sector time series data. Each of the proposed algorithms are evaluated on the basis of six metrics of performance in their forecasting accuracy and a comparative analysis of the methods is presented. These six metrics are: maximum error, minimum error, mean error, standard deviation of error, the root mean square error (RMSE), and the ratio of the RMSE value and the mean index value. Section 6 presents a brief literature survey on some of the existing work on time series analysis and forecasting. In Section 7, we conclude the paper and highlight some future scope of work.

Methodology
This section presents a brief discussion on the methodology that has been followed in this work. The rich features of the programming language R have been exploited in all three activities of analytics: data management, data analysis and presentation of results. Ihaka & Gentleman (1996) provided a detailed description of various features of the R programming language and its power and capabilities in data management and data analysis work. R is an open source language with a very large and rich collection of libraries with numerous useful in-built functions that makes it one of the most powerful tools for complex analytics projects.
In this work, the monthly average index values have been used from the Bombay Stock Exchange (BSE) of India for the mid cap sector for the period January 2010 -December 2016. The average monthly index values of the mid cap sector time series for the 7 year period are then stored in a plain text (.txt) file. Hence, the plain text file is now populated with 84 records, with each record referring to the average index value for one among the 84 months in the 7 year period under our investigation. The text file is then converted into an R data object by reading it into the object using the scan( ) function. The resultant R data object is then transformed into an R time series object by invoking the ts( ) function in R with its three parameters as: (i) the R data object, (ii) a frequency value of 12 and (iii) the starting month of the time series, i.e., January 2010. The frequency value is fixed at 12 so that the monthly seasonality characteristics of the time series can be analyzed. Finally, using the decompose( ) function in R that is defined in the library TTR, the time series object is decomposed into its three components: trend, seasonal and random. The aggregate time series of the mid cap sector and its three components are plotted using the plot( ) function in R.
Based on the numeric values of the components and the graphs, various important characteristics of the time series are analyzed.
We make a detailed analysis of the decomposition results of the times series of the mid cap sector so that various important properties revealed by the time series can be understood properly. Then, we propose six robust forecasting methods that can be applied on the time series so that its future values and behavior can be efficiently and accurately forecasted. In order to verify the accuracy in forecasting for each of the six methods, the forecast models are built using the mid cap time series data for the period January 2010 -December 2015.
Once the models are built, they are used for forecasting the time series values for each month of the year 2016. Since the actual values of the time series for all months of 2016 are already available with us, the error in forecasting is easily computed by computing the percentage by which the predicted values deviate from their corresponding actual time series values for each forecasting methods proposed. We also carry out a detailed comparative analysis of the six forecasting methods using various useful metrics proposed by us. Based on the values of the metrics for each forecasting method, we critically analyze the reason why a particular method performs better than the other methods for the mid time series. Sen & Datta Chaudhuri (2016a;2016b)  sector time series and the Indian capital goods sector time series. In yet another work, using the time series decomposition-based approach, Sen & Datta Chaudhuri (2016e) illustrated how time series analysis enables us to check the consistency between the fund style and actual fund composition of a mutual fund. Sen & Datta Chaudhuri (2017a; also analyzed the characteristics of the Indian healthcare and FMCG sector time series and proposed robust and efficient forecasting techniques to accurately predict the future values of the time series index of the two sectors. Sen (2017a; carried out studies on the realty sector and the metal sector of the Indian economy and identified some interesting characteristics of these two sectors.
In this work, we illustrate how time series decomposition-based approach can be utilized in an effective and accurate analysis of the behavior of the mid cap sector time series of the Indian economy. Based on the time series values for the mid cap sector during the period January 2010 -December 2016, we carry out a decomposition exercise on the time series and make a detailed analysis of the decomposition results so that the salient properties of the sector can be understood. In addition, we propose six different approaches to forecasting that can be applied on the mid cap time series for predicting its future values. Based on several metrics, we carry out a detailed comparative analysis of the forecasting methods, and examine which forecasting method performs most efficiently for the mid cap time series. We also analyze the reason why some methods perform very well and produce small error in forecasting, while the others produce higher margin of error.

Time Series Decomposition Results
We present the decomposition results for the time series of the mid cap sector index values based on the records of the BSE for the period January 2010 -December 2016. First, we create a plain text (.txt) file that contains the monthly average index values of the mid cap sector for the period January 2010 -December 2016. This file includes 84 records each record representing the average index value for a month for the 84 months in the 7 years under our study. The scan( ) function in R language is used to read the text file and store it in an R data object. The resultant R data object is converted into a time series object using the R time series function -ts( ). The value of the frequency parameter for the ts( ) function is taken as 12 so that the decomposition of the time series is done on a monthly basis. Once the time series data object is created, the graphic function plot( ) in R is used to draw the plot of the mid cap sector time series during the period January 2010 -December 2016. Figure 1      The overall conclusion is that the mid cap time series is primarily dominated by its trend component, while seasonal and random components are having not significant contributions to the aggregate time series. However, the series exhibited high randomness occasionally, and the seasonal and the random components showed significant variations across their mean values.

Proposed Forecasting Methods
This section presents a collection of forecasting approaches that can be effectively applied on

Forecasting Results
In this section, we present detailed results on the forecasting accuracies of the six methods that we discussed in Section 4. Table 2  Observations on Method I: It is clearly evident from Table 2 Table 3 Table 4. The error in forecast for each month and an overall RMSE value for this method are also computed and listed Table 4.   Method IV: We present the results of forecasting for Method IV in Table 5 Figure 7 depicts the first order difference of the mid cap sector time series. It is evident that the first-order difference of the mid cap sector time series is a stationary one, as its mean and variance are approximately time-invariant. Thus the value of d = 1 is cross-verified to be correct. Next, the partial auto correlation function (PACF) and the auto correlation function (ACF) are plotted to cross-check the values of the parameters p and q respectively. The PACF of the mid cap sector time series if depicted in Figure 8. It is observed that except for lag = 0, the partial correlation values at all lags are statistically insignificant. Hence the value of p = 0 is also cross-verified. We observe in Figure 9 that the minimum integral value of lag beyond which all autocorrelation values are statistically insignificant is 1. Therefore, the value of the parameter q = 1 is also cross-verified. Hence, we confirm that the mid cap sector time series for the period January 2010 -December 2015 can be modeled as an ARMA (0, 1, 1) model.

Observations on Method II: The observations in
We build the ARIMA model using the arima( ) function in R with two parameters as: (i)      Method VI: In this approach, we build an ARIMA model using a forecast horizon of one month. The approach that we follow for building the ARIMA model, however, is almost identical to that used in Method V. The difference in Method V and Method VI lies in different values of forecast horizon used in these methods. While Method V uses a forecast horizon of 12 months, we use a forecast horizon of 1 month in Method VI. Since, in Method VI, we make forecast only one month in advance, the training data set used for building the ARIMA model consistently increases by one record in each iteration, and hence, we reevaluate the parameters of the ARIMA model after every iteration of the ARIMA forecasting.
Stating in other words, in each month of 2015, before we make the forecast for the next month, we compute the values of the three parameters of the ARIMA model. The computation of the values of ARIMA parameters p, d, and q revealed that for the months of January, March, June and August -December 2016, the ARIMA model was (0, 1, 1). While the months of February, May and July yielded an ARIMA model of (1, 1, 1), the month of April exhibited an (0, 1, 0) ARIMA model. Table 7 presents the forecasting results for Method VI. Figure 11

Summary of Forecasting Results
In Table 8, we summarize the performance of the six forecasting methods that we have used.     2000) and Kim (2004) proposed applications of hybrid systems in stock price prediction.
In contrast to the work mentioned above, our approach in this paper is based on structural We computed the accuracies of each of the forecasting techniques, and critically analyzed under what situations a particular technique performs better than the other techniques.

Conclusion
In this paper, we have presented a time series decomposition-based approach for analyzing