A Temporal Forecasting Driven Approach Using Facebook’s Prophet Method for Anomaly Detection in Sewer Air Temperature Sensor System

Smart sensor systems play a decisive role in the condition assessment of concrete sewer pipes going through microbial corrosion. Few Australian water utilities adopt a predictive analytic model for estimating the corrosion. They require sensor inputs like sewer air temperature data for corrosion prediction. A sensor system was developed to monitor the daily variation of sewer air temperature inside the harsh sewer environmental conditions. However, a diagnostic tool to evaluate the streaming sensor data is vital for reliable monitoring. In this context, this paper proposes a temporal forecasting driven approach for anomaly detection in sewer air temperature sensor system. Several temporal forecasting models were comprehensively evaluated and adopted Facebook’s Prophet method based forecasting to develop an anomaly detection approach. The proposed approach was evaluated with sewer air temperature sensor data and the results indicate a reasonable anomaly detection performance.


I. INTRODUCTION
Sensor monitoring inside sewer pipes is challenging due to the harsh environmental conditions caused by the high concentration of gaseous hydrogen sulphide and high humidity levels. Such environmental conditions favour microbial activity on the surface of the concrete sewer pipe. The microbes are responsible for producing sulphuric acid on the concrete surface and largely influence concrete corrosion [1], [2]. The water utilities spend millions of dollars each year to repair and rehabilitate the pipes affected by concrete corrosion [3], [4]. If the water utilities fail to address the corrosion problem, it can result in sewer infrastructure breakdown [5].
Smart sensor technologies can play an important role in the condition assessment of concrete sewer pipes. Unfortunately, there are no sensors available to non-invasively measure the concrete corrosion inside the sewer pipe. Traditionally, the sewer operators travel inside the sewer pipe and take core This work was supported by the Predictive Analytics for Sewer Corrosion project, in part by the Sydney Water Corporation, in part by the Melbourne Water Corporation, in part by the Water Corporation (WA), and in part by the South Australian Water Corporation.
Corresponding Author: Karthick Thiyagarajan samples to estimate corrosion through laboratory analysis. This practise can cause occupational health hazards to the operators traversing inside the harsh sewer pipe environmental conditions. A smart sensing suite has been reported recently to estimate the thickness of the corroded layer inside the sewer pipe. This sensing technique employs ground penetrating radar [6] or electrical resistivity based sensor measurements [7], [8] or capacitance sensor [9] to identify the location of the rebar. Then, a pulsed eddy current sensing technology [10] is used to estimate the distance to rebar from the surface of the concrete [11]. Once the location and distance to rebar are known, an optimal location is identified to take corrosion measurements through a drilling based sensor technology [12]. Even this smart sensing technology is invasive and needs sewer operators to travel inside the sewer pipe for inspection. With recent advancements in predictive analytics, researchers in collaboration with Australian water utilities have developed a sensor data-driven model for predicting the corrosion across the sewer pipe [13]. Those models primarily incorporate air temperature, relative humidity, and hydrogen sulphide concentration of sewer air as data inputs to the model [14]. Also, the model takes surface temperature sensor measurements [15], [16] and surface moisture sensor measurements [17]- [19] as additional inputs to reduce prediction uncertainty. This predictive analytics based corrosion estimation needs longterm sensor inputs. However, sensors can produce random anomaly or a continuous stream of anomalies in sewer environmental conditions [20], [21]. Hence, it is important to have a diagnostic tool to automatically detect anomalies in sensors such as sewer air temperature sensor, which provides crucial data inputs to the models predicting corrosion. Time series or temporal forecasting models are widely used to develop anomaly detection approaches. Those approaches highly rely on the accuracy of the forecast data to statistically detect an individual anomaly or a group of anomalies. The Autoregressive Integrated Moving Average (ARIMA) model is one such advanced model used for anomaly detection [22], [23]. This model integrates the Autoregressive model with the Moving Average model [24]. The seasonality of the ARIMA model can be tuned for optimal forecasting, which is popularly known as Seasonal Autoregressive Integrated Moving Average (SARIMA) model [25]. The optimization parameters of the SARIMA model can be automatically chosen by using Hyndman and Khandakar algorithm [26]. This is known as Auto.Arima model, which automatically fits the seasonal parameters and performs temporal forecasting. Other models such as the Exponential Smoothing State Space (ETS) model [27], Bagged Model [28] and the TBATS model [29] can forecast the uni-variate data coming from the sewer air temperature sensor system. Recently, Facebook has developed a Prophet method for forecasting, which is based on the additive modeling approach [30]. All the aforementioned model's forecasting performance will be evaluated for developing an anomaly detection framework.
A sewer air temperature sensor system having a thermistor as an active sensing element was developed and installed inside the sewer pipe located in Sydney city, Australia. The sensor was deployed on 3 rd November 2016. The readers are suggested to refer [15] for more details on sensor deployment. This paper proposes a temporal forecasting driven approach for anomaly detection in sewer air temperature sensor system. The main contributions of this paper are threefold. Firstly, we investigated the temporal forecasting performance of different time series models such as Facebook's Prophet model, Auto.Arima model, TBATS model, ETS model and Bagged model for forecasting sewer air temperature sensor measurements. Secondly, we statistically studied the forecasting performance of time series models by forecasting only one day ahead and updating the training data set for forecasting subsequent day and thirdly, we developed an anomaly detection approach using the temporal forecasting module and evaluated it using the sewer air temperature sensor data.
This paper is structured as follows: Section II describes the methodology for anomaly detection. Section III evaluates the proposed approach and presents the results with discussion and finally, Section IV concludes the paper by highlighting the key outcomes and briefing the prospects.

II. PROPOSED APPROACH
This section presents the formulation of temporal forecasting driven approach for anomaly detection in sewer air temperature system. Let A T , A T −1 , A T −2 , ... be the sewer air temperature sensor measurements at one hour time intervals T, T −1, T −2, ..... Facebook's Prophet method is employed in this work for forecasting sensor measurements. This method accommodates three components. The first component is a trend function represented as g(t), which is used to model the non-periodic changes in the sensor measurements taken at equally spaced time intervals. The second component is seasonality, which is used to represent the periodic variations in sewer air temperature data. The seasonality component is denoted as s(t). Finally, the third component h(t) is used to represent the potential irregular schedules for sensor monitoring. The forecast value of Facebook's Prophet method is denoted as y(t) and it is given by equation (1): where t represents the changes that are not modeled by g(t), s(t) and h(t). In this work, we consider the sensor monitoring is continuous and there are no schedules for halting sensor measurements. Therefore, we use h(t) = 0. Then, equation (1) is simplified to equation (2): The sewer air temperature data trend is captured by using the logistic growth model, which is accommodated in g(t) and it can be mathematically expressed as: where C is the carrying capacity, k is the rate of growth and m is the offset parameter. The seasonality for the y(t) is given by equation (4): where P is the daily period of the time series sensor data, a n and b n are the seasonal constants to compute seasonality, n is the seasonal parameter, t is the instantaneous time. Once Facebook's Prophet model is trained, it forecasts for one day. It means that y(t) has 24 forecast values. The forecasted y(t) is compared with the sensor measurements A T . A lower diagnostic bound L Dt and upper diagnostic bound U Dt is used for detecting an anomaly. The L Dt and U Dt are defined in equation (5) and (6) respectively.
where β is the heuristic threshold value for setting the diagnostic bound levels in the anomaly detection system. Each sewer air temperature sensor measurement value will be evaluated to check the presence of anomolous data. If the sensor data A T satisfies the condition defined in (7), then it is treated as good data (anomaly-free data). In the scenario where the A T is outside the diagnostic bounds, it is considered as an anomaly.
Upon the detection of an anomaly, it is flagged and replaced with the respective forecast value coming from y(t). Then, the replaced value along with the good data values for that respective day will be pushed into the stack of training data to forecast for the subsequent day. This process is repeated each day.

A. Performance Evaluation of Temporal Forecasting Models
This section evaluates the forecasting performance of time series models by comparing the forecasts of Facebook's Prophet method with the forecasts of other time series models such as Auto.Arima model, TBATS model, ETS model, and Bagged model. To evaluate the forecasting performance of each model, the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were used as the statistical metrics. The RMSE and MAE were calculated for the forecasted data of each model by using (8) and (9) respectively.
where the F d is the data from different time series forecasting models, S d is the sewer air temperature sensor data, t is the instantaneous time and n is the total number of forecast values. Each model was trained using the sewer air temperature sensor data from 4 th November 2016 to 17 th November 2016. The training data contains 336 sensor measurement values. Figure 1 shows the training data plot. By using the two weeks training data, each model forecasted sensor measurements for one week from 18 th November 2016 to 24 th November 2016. Figure 2 shows the forecasted data of each model, where it can be observed that the forecasts of Facebook's Prophet method are closer to actual sewer air temperature sensor data and also follows similar temporal trends. Table  I tabulates the computed MAE and RMSE values for each model based on one week forecasted data, where it can be observed that the Facebook's Prophet method has the lowest value of MAE and RMSE among the five different temporal forecasting models. This shows that the temporal forecasting performance of Facebook's Prophet method is better than the other compared models.    Table II and Table III, it can be observed that Facebook's Prophet method has the lowest MAE and RMSE values for all the days. However, the MAE and RMSE values of subsequent days were increased from Day 1. This pattern was observed for all other models. Therefore, from Table II and Table III, it can be concluded that the MAE and RMSE increase when the number of forecasting days increases. This can be attributed to  the training data used for forecasting. Therefore, Facebook's Prophet method is more suitable for forecasting short-term (daily) sewer air temperature sensor data than forecasting longterm (weekly).

B. Temporal Forecasting Performance Evaluation with Daily Feedback
This section evaluates the temporal forecasting performance of each model by forecasting one day. Each model was trained using the sewer air temperature sensor data from 4 th November 2016 to 17 th November 2016. By using the training data set, each model forecasted sensor measurements for one day. It is assumed in this experimentation that the sensor data of the forecasted day is anomaly free and the sensor measurements of the forecasted day are pushed into the stack of training data set for forecasting the next day. This forecasting process was iterated for one week. Figure 3 shows the forecasts for each model, where it can be observed that all the models follow the same trend as the actual sewer air temperature sensor data. To analyse statistically, MAE and RMSE were computed for one week and tabulated in Table IV, where it can be noticed that the forecasts of Facebook's Prophet method have lowest MAE and RMSE when compared with the other temporal forecasting models. However, there is no significant variation in MAE and RMSE between the models. By comparing the statistical metrics tabulated in Table I and Table IV, the temporal forecasting performances of all the models were improved significantly. This is due to the methodology adopted for forecasting sewer air temperature sensor data by one day and updating the training data set with previous day sensor measurements to have an efficient forecasting process.   The computed forecast data using the methodology adopted in this experimentation were compared with the sensor data for each day. Table V and Table VI

C. Performance Evaluation of the Temporal Forecasting Driven Approach for Anomaly Detection
This section evaluates the developed temporal forecasting driven approach using Facebook's Prophet method for anomaly detection of sewer air temperature sensor measurements. The anomaly detection approach was trained by using the sewer air temperature sensor data from 4 th November 2016 to 17 th November 2016. During the laboratory testing of the sewer air temperature sensor system, the sensor worked abruptly and produced a stream of anomalies. In this experimentation, we have manually injected those anomalies produced at the time of lab testing along with the anomalies produced by the sensor during the field testing inside the sewer pipe. Figure 4 shows the evaluation of the proposed approach for anomaly detection. The first plot of Fig. 4 shows the sewer air temperature sensor data with anomalies. There were a total of 25 anomalies. The second plot shows the forecasted data with diagnostic bounds. The upper diagnostic bound is y(t) + β whereas the lower diagnostic bound is y(t) − β. For the sewer air temperature anomaly detection, β = 1 • C. The value was heuristically chosen. The third plot shows the corrected sensor data, i.e., when the proposed approach has detected the sensor anomaly, the anomaly is corrected by using the respective forecast data. Then, the corrected data will be a part of the training data set for forecasting next day sensor measurements.
The proposed model forecasts the sewer air temperature data for 18 th November 2016. The model compares the forecasted data with the sewer air temperature sensor measurements and checks whether the sensor data is within the diagnostic bounds. If the sensor data is within the diagnostic bounds, then the sensor measurement is treated as good data. Otherwise, if the sensor data is not within the diagnostic bounds, then the sensor measurement is treated as an anomaly. The detected anomaly is corrected with the respective forecasted data. Then, all the 24 data points are stacked into the training data set to perform forecasting for 19 th November 2016. This process continues as long as the sensor measurements are available. The proposed anomaly detection approach has detected 23 out of 25 anomalies, which shows a reasonable performance of the proposed approach.

IV. CONCLUSION AND FUTURE WORK
This paper proposes a temporal forecasting driven approach using Facebook's Prophet method for anomaly detection in sewer air temperature sensor system. The key contributions of this paper are summarized as follows: • The temporal forecasts of Facebook's Prophet method for one week of sewer air temperature data have MAE of 0.13 • C and 0.16 • C RMSE. This method has the highest prediction accuracy when compared with other models. Also, it was observed that the prediction accuracy decreases slightly when the number of forecasting days increases. Therefore, Facebook's Prophet method is suggested for forecasting short-term (daily) sewer air temperature sensor data. • The temporal forecasting performance of all the models improved when they forecasted only one day ahead. Facebook's Prophet method has 0.09 • C MAE and 0.12 • C RMSE, which was the lowest among all the compared models. Hence, Facebook's prophet method was chosen to develop the anomaly detection approach by forecasting only one day ahead. • Anomaly detection approach for sewer air temperature sensor system was developed and evaluated by using the data sourced from the sewer pipe. The evaluation results indicate the reasonable performance of the proposed approach as it detected 23 out of 25 anomalies.
The sewer air temperature sensor is a part of a multi-sensor suite. The other sensors include a surface temperature sensor and a surface moisture sensor. In the future, we intend to develop an algorithm leveraging Facebook's Prophet method for detecting anomaly streaming from the multi-sensor suite monitoring inside harsh sewer environmental conditions.