A Time Series Analysis - Based Forecasting Approach for the Indian Realty Sector

Prediction of stock prices using time series analysis is quite a difficult and challenging task since the stock prices usually depict random patterns of movement. However, the last decade has witnessed rapid development and evolution of sophisticated algorithms for complex statistical analysis. These algorithms are capable of processing a large volume of time series data executing on high-performance hardware and parallel computing architecture. Thus computations which were seemingly impossible to perform a few years back are quite amenable to real-time time processing and effective analysis today. Stock market time series data are large in volume, and quite often need real-time processing and analysis. Thus it is quite natural that research community has focused on designing and developing robust predictive models for accurately forecasting stochastic nature of stock price movements. This work presents a time series decomposition-based approach for understanding the past behavior of the realty sector of India, and forecasting its behavior in future. While the forecasting models are built using the time series data of the realty sector for the period January 2010 till December 2015, the prediction is made for the time series index values for the months of the year 2016. A detailed comparative analysis of the methods are presented with respect to their forecasting accuracy and extensive results are provided to demonstrate the effectiveness of the six proposed forecasting models.


Introduction
Developing an accurate and efficient forecasting model for predicting stock prices has been one of the most exciting challenges confronting the research community working in the field of machine learning, applied econometrics and artificial intelligence. Various technical, fundamental and statistical indicators have been proposed in the literature for predicting stock prices. However, each method has its own limitations and none have been fully effective in predicting the stochastic movement of stock prices. (Sen & Datta Chaudhuri, 2016a;Sen & Datta Chaudhuri, 2016b;Sen & Datta Chaudhuri, 2016c) proposed a novel approach towards portfolio diversification and prediction of stock prices. The authors argued that different sectors in an economy do not exhibit identical pattern of variations in their stock prices. Different sectors exhibit different trend patterns, different seasonal characteristics and also differ in the randomness in their time series. While on one side the efficient market hypothesis has focused on the randomness aspect of stock price movements, on the other side, there are propositions to disprove the hypothesis delving into various fundamental characteristics of different stocks. It may be contended that besides the differences in the fundamental characteristics among stocks of different companies, performances of different stocks also have a lot to do with the sectors to which the stocks belong. Since each sector has its own set of factors influencing its behavior, the price movements of stocks belonging to different sectors are guided by these factors. The factors responsible for the phenomenal growth of the information technology (IT) sector in India are different from those which have made the metals sector in the country sluggish, or the realty sector grow at a slow pace. From the point of view of investors in the stock market, it is critical to identify these factors and analyze them effectively for optimal portfolio choice and also for churning of the portfolio.
In this paper, we focus on the time series pattern of the realty sector in India in order to understand its distinguishing characteristics. We use the monthly time series index values of the Indian realty sector during the period January 2010 till December 2016 as per the Bombay Stock Exchange (BSE). We decompose the time series using R programming language. We, then, illustrate how the time series decomposition approach provides us with useful insights into various characteristics and properties of the realty sector time series. It is further demonstrated that a careful and deeper study of the trend, seasonal and random components values of the time series enables one to understand the growth pattern, the seasonal characteristics and the degree of randomness exhibited by the time series index values. We also propose an extensive framework for time series forecasting in which we present six different approaches of prediction of time series index values. We critically analyze the six approaches and also explain the reason why some methods perform better and produce lower values of forecast error in comparison to other methods.
The rest of the paper is organized as follows. Section 2 describes the detailed methodology used in this work. The method of time series decomposition and of the realty sector time series is explained in detail in this section. Section 3 9 presents the decomposition results the time series index values into its three components -trend, seasonal and random. The behavior exhibited by the time series is analyzed based on the decomposition results. Section 4 presents a detailed forecasting framework consisting of six different models of forecasting that are applied on the realty sector time series. Section 5 presents extensive results on the performance of the six forecasting methods on the realty sector time series data. A comparative analysis of the techniques is also provided on the basis of six different metrics of the forecasting techniques: maximum error, minimum error, mean error, standard deviation of error, the root mean square error (RMSE), and the ratio of the RMSE value and the mean index value. Section 6 presents a brief discussion on some of the existing work in the literature on time series forecasting with particular focus on the realty sector. Finally, Section 7 concludes the paper.

Methodology
The Section presents a brief discussion on the methodology followed in this work. We use the programming language R for data management, data analysis and pictorial presentation of the results. (Ihaka & Gentleman, 1996) provides a detailed description of various capabilities of R programming language. R is an open source language with a very rich set of libraries having in-built functions that makes it one of the most powerful tools in handling data analytics projects. For the current work, we have used the monthly index data from the Bombay Stock Exchange (BSE) of India for the realty sector for the period January 2010 till December 2016. The monthly index values of the realty sector for the 7 years are stored in a plain text (.txt) file. This plain text file contains 84 index values corresponding to the 84 months in the 7 year period under our study. The text file is then read into an R data object using the scan( ) function. The R data object is then converted into a time series object by applying the ts( ) function with a frequency value of 12. The frequency value is chosen to be 12 so that the seasonality characteristics of the time series for each month can be analyzed. The time series data object in R is then decomposed into its three componentstrend, seasonal and randomusing the decompose ( ) function which is defined in the TTR library in the R environment. We plot the graphs of the realty time series data as well as its three components so that further analysis can be made on the behavior of the time series and its three components. After carrying out a comprehensive analysis of the decomposition results of the time series of the realty sector, we propose six different approaches of forecasting of time series index values. In order to compute the forecast accuracy of each method, we build the forecast models using the realty time series data for the period January 2010 till December 2015, and apply the six models to forecast time series index values for each month of the year 2016. Since the actual values of the time series for all months of 2016 are already available with us, we compute the error in forecasting using each method of forecast that we propose. Comparative analysis of the methods of forecasting is done based on several useful metrics and why a particular method performs better than the other methods for the realty sector time series are critically analyzed in detail. The work in this paper follows from several previous work. (Sen & Datta Chaudhuri, 2016a;Sen & Datta Chaudhuri 2016b) demonstrated how effectively time series decomposition approach can be utilized in robust analysis and forecasting of the Indian Auto sector. In another different work, (Sen & Datta Chaudhuri, 2016c) analyzed the behavior of two different sectors of Indian economythe small cap sector and the capital goods sectorthe former having a dominant random component while the latter exhibiting a significant seasonal component. Following another approach of time series analysis, (Sen & Datta Chaudhuri, 2016d) studied the behavior of the Indian information technology (IT) sector time series and the Indian capital goods sector time series. In yet another work, using the time series decomposition-based approach, (Sen & Datta Chaudhuri, 2016e) illustrated how time series analysis enables us to check the consistency between the fund style and actual fund composition of a mutual fund. In two different work, (Sen & Datta Chaudhuri, 2017a;Sen & Datta Chaudhuri 2017b) presented detailed analysis of the behavior of the healthcare sector and the fast moving consumer goods (FMCG) sector of India using time series decomposition approaches. In this work, we demonstrate how time series decomposition-based approach enables one in analyzing and understanding the behavior and different properties of the realty time series of the Indian economy based on time series data for the period January 2010 till December 2016. We also investigate what forecasting approach is most effective for the realty time series. For this purpose, we compare several approaches of forecasting and identify the one that produces the minimum value of forecasting error. We critically analyze all the proposed forecasting approaches, and explain why a particular approach has worked most effectively while some others have not done so for the realty time series data.

Time Series Decomposition Results
We present the decomposition results for the time series of the realty sector index values as per the records of the BSE for the period January 2010 till December 2016. First, we create a plain text (.txt) file containing the monthly index values of the realty sector for the period January 2010 till December 2016. This file contains 84 records corresponding to the 84 months in the 7 years under our study. We use the scan( ) function in R language to read the text file and store it in an R data object. Then, we convert this R data object into a time series object using the R function ts( ). We used the value of the frequency parameter in the ts( ) function as 12 so that the decomposition of the time series is carried out on monthly basis. After creating the time series data object, we used the plot( ) function in R to draw the graph of the realty sector time series for the period January 2010 till December 2016. Figure 1 depicts the pictorial representation of the realty sector time series. To obtain further insights into the characteristics of the time series, we decomposed the time series object into its three componentstrend, seasonal and random. The decomposition of the time series object is done using the decompose( ) function defined in the TTR library in R programming environment. The decompose( ) function is executed with the realty time series object as its parameter and the three components of the time series are obtained. Figure 2 presents the graphs of realty sector time series and its three components. Figure 2 consists of four boxes arranged in a stack. The boxes display the overall time series, the trend, the seasonal and the random component respectively arranged from top to bottom in that order. From Figure 1, it may be seen that the time series of the realty sector has consistently fallen during the period January 2010 till December 2016 with occasional minor upward swings. The index value for the realty sector in the month of January 2010 was 3500, while in the month of December 2016, the index was found to be 1264. Except for two short periods -August 2012 to January 2013 and January 2014 to June 2014 -in which the realty sector index exhibited a modest increase in its value, in all other months during the period of our study, the index showed a downward movement. The three components of the time series are shown separately so that their relative behavior can be visualized.

Proposed Forecasting Methods
This Section presents a set of six forecasting approaches that we propose for predicting the time series index of the realty sector. We discuss the details of the six different methods of forecasting and present the performance of these approaches in predicting the realty sector time series index. For the purpose of comparative analysis of different approaches of forecasting, we use five different metrics and identify the method that yields the lowest value of forecasting error. We also critically analyze the approaches and argue why one method performs better than the others on the given dataset of realty sector time series index for the period January 2010 -December 2016. In the following, we first describe the six approaches, and then provide the detailed results as these forecasting methods are applied on the realty sector dataset.

Method 1:
In this method, we use the realty sector time series data for the period January 2010 till December 2015 for the purpose of forecasting the monthly index values for each month of the year 2016. The HoltWinters( ) function in R library forecast has been used for this purpose. In order to build a robust forecasting framework, the HoltWinters model is used with a changing trend and an additive seasonal component that best fits the realty time series index data. The forecast horizon in the HoltWinters model is chosen to be 12 so that the forecasted values for all months of 2016 can be obtained by using the method at the end of the year 2015. Forecast error is computed for each month of 2016 and an overall RMSE value is also derived for this method. Method II: In this approach, the realty sector index value for each month of the year 2016 is forecasted using the HoltWinters( ) method with a forecast horizon of 1 month. For example, for the purpose of forecasting the index for the month of March 2016, the index values of the realty sector from January 2010 till February 2016 are used to develop the forecasting model. As in Method I, the HoltWinters model is used with a changing trend and an additive seasonal component. Since the forecast horizon is short, the model is likely to produce higher accuracy in forecasting compared to the approach followed in Method I that used a forecast horizon of 12 months. The forecast error corresponding to each month of 2016 and an overall RMSE value for the model is computed. Method VI: As in Method V, we use the ARIMA model of forecasting in this method. However, unlike Method V that used forecast horizon of 12 months, this method uses a short forecast horizon of 1 month. For the purpose of forecasting, the ARIMA model is built using time series data for the period January 2010 till the month previous to the month for which forecasting is being made. For example, for the purpose of prediction of the time series index for the month of May 2016, the time series data from January 2010 till April 2016 is used for building the ARIMA model. Since the training data set for building the ARIMA model constantly changes in this approach, we evaluate the ARIMA parameters (i.e., p, d, and q) before every round of forecasting. In other words, for each month of the year 2016, before we make the forecast for the next month, we compute the values of the three parameters of the ARIMA model.

Forecasting Results
In this Section, we provide results on the performance of the six forecasting approaches. Method I: The results obtained using these methods are presented in Table 2  Method II: The results of forecasting using Method II are presented in Table 3. In Figure 4, the actual index values and their corresponding predicted values are plotted.   Method III: The results of forecasting using this method are presented in Table 4. Figure   Method IV: Table 5 presents the results of forecasting for Method IV. Figure 6 shows  Table 5. The percentage of error in forecasting for each month during the period July 2015 till June 2016, and an overall RMSE value are also listed.   Figure 7. It is clear that the first-order difference time series is a stationary one, as the mean and the variance of the first-order difference time series are approximately constant. Hence the value of d = 1 is crossverified. Figure 8 depicts the PACF of the realty sector time series for the period January 2010 till December 2015. It is clear that except for lag = 0, the partial correlation values at all other lags are insignificant. Hence the value of p = 0 is also verified. Figure 9 shows that minimum integral value of lag beyond which all autocorrelation values are insignificant is 1. Therefore, q = 1 is also verified. Hence, we have verified that the realty sector time series for the period January 2010 till December 2015 is an ARMA (0, 1, 1) model. Using the arima( ) function in R with its two parameters: (i) the realty sector time series R object and (ii) the order (0, 1, 0) of ARMA, we build the ARIMA model. Finally, we use the function forecast.Arima( ) with two parameters: (i) the ARIMA model and (ii) the time horizon of forecast = 12 months, for forecasting the index values of the time series for all the twelve months of the year 2016. Table 6 presents the results of forecasting using this method, and Figure    Observations on Method V: It is evident from Table 6 that Method V is quite effective in forecasting the realty sector time series. The lowest value of error percentage had been 0.96 that occurred in the month of April 2016, while the highest value of error percentage was 27.88, observed in the month of February 2016. The RMSE value for this method is found to be 171 which is 12.39 per cent of the mean value of the realty sector index during the period January 2016 till December 2016. The mean value of the realty sector index has been 1380. Considering the fact that this method uses a long forecast horizon of 12 months, the error percentage values are quite moderate. This is attributed to the fact that the realty sector index experienced a very small dispersions in its values during the year 2016. The index started with a value of 1209 in January 2016, experienced its lowest value of 1051in February, then attaining its peak value of 1607 in the month of July and then again decreasing in its value finally reaching a value of 1264 in the month of December 2016. Method V with a forecast horizon of 12 forecasted a constant average value of 1344 for the series with ARIMA parameters (0, 1, 1), so that the average error for all the forecasted values is minimized. Figure 10 presents a graphical depiction of the actual index values and their corresponding predicted values using Method V. Method VI: In this approach, we build an ARIMA model with a forecast horizon of one month. The methodology used for building the ARIMA model, however, is exactly identical to that used in Method V. The difference in Method V and Method VI lies in different values of forecast horizon used in these methods. While Method V used a forecast horizon of 12 months, we use a forecast horizon of 1 month in Method VI. Since, in Method VI, forecast is made only one month in advance, the training data set used for building the ARIMA model constantly increases in size, and hence, we re-evaluate the parameters of the ARIMA model every time we use it in forecasting. In other words, for every month of 2016, before we make the forecast for the next month, we compute the values of the parameters of the ARIMA model. Computation of the values of ARIMA parameters p, d, and q showed that for the period January 2016 till July 2016, the ARIMA model was (1, 1, 1), while the for the remaining period it was (0, 1, 0). Table 7 presents the forecasting results for Method VI. Figure 11 depicts the actual index values of the realty sector and their corresponding predicted values for this method of forecasting.  Observations on Method VI: From Table 7, it is evident that the error percentage values for all months of the year 2016 are quite low. The lowest value of error percentage was found to be 1.42 in the month of December 2016, while the highest value of error was 21.37 per cent in the month of November 2016. The RMSE value for Method VI of forecasting is found to be 128. The mean value of the index of the realty sector for the period January 2016 till December 2016 is 1380. Hence, the RMSE value is 9.28 percent of the mean value of the actual index of the realty sector. This indicates that Method VI has been highly accurate in forecasting the realty sector index values. The high level of accuracy of Method VI may be attributed to its short forecast horizon of one month. The short forecast horizon is able to catch the changing pattern of the time series very effectively. This has resulted into a very small error in forecasting. It is clearly evident from Figure 11 that the forecasted time series values exactly followed the pattern of the time series of the actual index values of the realty sector.

Summary of Forecasting Results
In Table 8 Table 8 presents the comparative analysis of the six forecasting methods. It can be seen that Method III that uses the sum of the forecasted trend values using HoltWinters( ) function of horizon 12 months and the past seasonal values to predict the sum of the future trends values and the new seasonal values, has performed has produced the lowest percentage value of the ratio of RMSE to the mean index value. In fact, Method III has produced lowest values for all other metrics too. Hence, Method III clearly turns out to be the most accurate among all the six methods. On the other hand, the performance of Method I has been the worst since it has produced the highest values for all the five metrics of error percentages. Method II turns out be a close second, while Method V and Method III follow it in order of their performance. The performance of Method V turns out to be worse than that of Method III, although it has performed much better than Method I.

Related Work
Several approaches and techniques are proposed by researchers in the literature for forecasting of daily stock prices. Among these approaches, neural network-based approaches are extremely popular. Mostafa (2010) proposed a neural network-based technique for predicting movement of stock prices in Kuwait. Kimoto et al. (1990) presented a technique using neural network based on historical accounting data and various macroeconomic parameters to forecast variations in stock returns. Leigh et al. (2005) Leung et al. (2000) and Kim (2004) proposed applications of hybrid systems in stock price prediction. In the literature, researchers have also proposed several forecasting techniques which have particularly focused on various issues in the realty sector. Karakozova (2005) carried out a detailed empirical evaluation of alternative econometric methods for modeling and forecasting rents and returns in property markets. The authors used the Finnish property market as a case study and observed that the choice of econometric model depends on whether we use the model to test the theory, analyze the policy or for making forecasts. The study also revealed that theory-based econometric methods are more effective in evaluating suitability of theoretical frameworks for modeling rents and returns. On the other hand, in forecasting of rents and returns, time series techniques are found to produce better results. an de Meulen et al. (2011) observed that house price fluctuations in Germany are significantly affected by the financial stability and economic development of the country. The authors measured price movements in different real estate markets in Germany and forecasted the short-term price fluctuations in the real estate sector. Using the auto regressive AR(p) model as a base, it was found that vector auto regression (VAR) and auto regressive distributed lag (ARDL) models with additional macroeconomic information improves the forecasting accuracy. Grinis (2015) observed that global outlook in realty sector for the period 2016 -2018 would be positive, if the macro-economic fundamentals are sound. The author observed that: (i) the asset prices will be positively affected by stable inflationary policy over the long term, (ii) unemployment figures would continue exhibit downward trend in most major markets in the globe which would lead to a boom in the real-estate sector, (iii) commodity prices that affect consumers and the real estate market would not reflect any significant rise in the near future. Dietzel et al. (2014) examined the role of Internet search data in the commercial real estate sector. The authors constructed various forecast models and found that the inclusion of Google search data significantly improves the forecast results of commercial real estate prices for the US market. In other words, the investigation revealed that Google search data are extremely effective in measuring sentiment in the commercial real estate markets. Glaeser et al. (2017) studied the recent boom in the real estate sector in China and analyzed the factors affecting the demand and supply of housing in that country. On the demand side, the authors examined the economic, demographic, cultural, and speculative factors that influences the demand in the real-estate sector. In the supply side, housing prices to the physical costs of construction and assessment of the long-run price of land were considered. The study revealed that a housing crash is not inevitable in China in the near future provided suitable policies are adopted by the Chinese government. Booth & Marcato (2003) presented formulated various methods of constructing real estate indices using the available data of the real estate sector. The authors also demonstrated how the use of publicly available real-estate data can sometimes lead to imperfect models and how the real estate data can be effectively exploited in designing stochastic investment modeling for actuarial purposes. Gaspareniene et al. (2014) proposed models for identifying and analyzing the factors that influences housing price level formation in the economies of the developing countries. The study revealed that the structure of the model of housing price level formation should be an integral multi-stage aggregate of microeconomic, macroeconomic and other elements that effectively describe the price fluctuations in the real estate sector. Gelain & Lansing (2014) analyzed the behavior of the equilibrium price-rent ratio for housing in a standard asset pricing model and compared the model predictions to survey evidence on the returns expectations of real-world housing investors. In contrast to the work mentioned above, our work in this paper deals with a structural decomposition of the time series of the realty sector index in India during the period January 2010 till December 2016. Based on the decomposition results of the time series, we identified several important characteristics of the Indian realty sector. We particularly investigated the nature of the trend, seasonality pattern and degree of randomness exhibited by the time series. After analyzing the nature of the realty time series, we proposed six forecasting techniques for predicting the index values of the sector for each month of the year 2016. We computed the accuracies of each of the forecasting techniques, and critically analyzed under what situations a particular technique performs better than the other techniques. Since the forecasting methods proposed in this paper are all generic in nature, these methods can be very effectively applied in forecasting the future trends and behavior of time series index values of other sectors of economy of India or other countries in the world.

Conclusion
This paper has presented a time series decomposition-based approach for analyzing the behavior of the time series of the realty sector of the Indian economy during the period January 2010 till December 2016. Algorithms and librarydefined functions in the R programming language have been used to decompose the time series index values into three components-trend, seasonal, and random. The decomposition results of the time series provided with several important insights into the behavior exhibited by the realty sector time series during the period under our study. Based on the decomposition results, the degree of seasonality and randomness in the time series have been computed. Particularly, it has been possible to identify the months during which the seasonal component in the realty time series plays a major role. The seasonal component is found After a careful analysis of the decomposition results of the realty sector index time series, we proposed six methods for forecasting the time series index values. The six method of forecasting involved different algorithms and different lengths of forecast horizon. It was observed that Method III that used the sum of the forecasted trend values using HoltWinters( ) function of horizon 12 months and the past seasonal values to predict the sum of the future trends values and the new seasonal values had performed best yielding the lowest percentage value of the ratio of RMSE to the mean index value. However, Method I that predicted the trend values using HoltWinters( ) forecasting approach with a forecast horizon of 12 months was found to produce the highest value of the ratio of the RMSE to the mean index value, thereby exhibiting the worst performance among the six methods of forecasting. The performance of Method VI that was based on an ARIMA model with a forecast horizon of 1 month, also performed very efficiently with RMSE to mean index ratio value of 9.28 percent. The other three methods, i.e., Method II (i.e., HoltWinters model with a forecast horizon of 1 month), Method IV( i.e., the model that used the sum of the forecasted trend values using a linear regression and the past seasonal values to predict the sum of the actual trend and seasonal values) and Method V (i.e., ARIMA model with a forecast horizon of 12 months) performed moderately well with none of their RMSE to mean index values exceeding the threshold of value of 20 percent. While the results in this work provide enough valuable insights into the characteristics of the realty index time series in India, and they also serve as guidelines for choosing an appropriate forecasting framework for predicting the future index values of the time series, these results can be extremely useful for constructing an optimized portfolio of stocks. Performing similar exercise on different sectors will enable analysts to understand the individual characteristics of the trend, seasonality and randomness of those sectors. This information can be suitably leveraged by portfolio managers in identifying the timing of buy and sell of stocks from different sectors thereby designing an efficient and optimized portfolio.