Estimation of monthly Global Horizontal Irradiation pan India using spatial interpolation and comparing its deviations from standard dataset

In the current scenario of increasing demand for solar photovoltaic (PV) systems, the need to predict their feasibility and performance is more than ever. Irradiance of a geographical location almost exclusively determines the generation possible via solar. Hence, accurate irradiance data is required to assess the value of solar PV systems. Emphasizing such need, this paper presents a method of estimating global horizontal irradiance (GHI) using the two dimensional (2-D) spatial interpolation technique. The proposed model is geo-agnostic and can estimate irradiance depending on the geographical range of the input data. This paper also compares the model predictions with a standard irradiation dataset in the industry. This comparison helps in getting insights regarding the spatio-temporal trends in recent times.


Introduction
PV systems are a predominant means of harnessing solar energy. They are cheaper than most means of renewable energy along with low periodic maintenance. They are also highly durable and easily scalable. Hence, there is a rapid demand for them worldwide. Since any decrease in PV generation can result in considerable financial penalties, it is important to predict the possible generation. Solar irradiance is a key factor used in yield assessment. Generation has an almost linear dependence on input irradiance. Thus, the estimation of irradiance is an important exercise for PV asset management. Solar irradiance is measured in various ways each with a different notion of incident radiation. S.M. Maleki et al [1] familiarizes with the requisite concepts. It dives deep into the formulation of such concepts and elucidates the mechanics of changing irradiance with time. D.Young et al [2] and D.Palmer et al [3] discuss the current methodologies used by the industrial as well as research communities not only to define the scope of the engineering challenges but also solving them. These provide a fairly lucid picture of the system and establish the context to understand the motivation behind this study and helps better appreciate the research work presented. Spatial interpolation is a well known and effective technique in modelling distributions and parameters which are dependent on geography. Ryu, JS., et al [4] emphasizes the challenges of geostatistics and explains how interpolation techniques such as inverse distance weighted functions, kriging, etc., can be leveraged for accurate estimation. B. Bacchi et al [5] even presents the case of explaining the complexity of numerical weather prediction models. It circumvents the issue of weather prediction by employing spatial correlation techniques to explain rainfall trends. D. Perez-Astudillo et al [6] is a study which is similar to the one in this paper. It attempts to map GHI trends all over Qatar only using the weather station data. However, this paper goes a step further and performs interpolation pan India producing gap-filled estimates at every grid coordinates spaced 10 Kilometer (Km) apart i.e., 0.1° latitude/longitude difference. This paper is organized as follows. Section 2 explains the methodology behind the estimation model. Section 3 discusses the obtained results and their accuracy. It also compares the deviations between the standard dataset and the model predictions. Section 4 summarizes the content and gives concluding remarks.

Methodology
Most of the PV projects are installed at an orientation(s) as per either design optimization or client-side requirements or both. Thus, irradiation sensor data are the measurements of Global Tilted Irradiance (GTI) in respective sites. In terms of plant monitoring, GTI makes more practical sense as it captures the incident irradiance on the tilted modules. But discerning inherent trends from such GTI data alone is complex as it majorly depends on the tilt and azimuth. GHI trends are also known to be more gradual and continually smoother than those of GTI. Hence, the idea is to transform GTI to GHI which will eliminate the model's dependence on sensor orientation. This reduces the model complexity and simultaneously increases the accuracy of GHI estimates. Thus, we have used a transform domain approach to achieve the aim of predicting irradiance more accurately as summed up below. GTI data is firstly transformed into GHI data using the corresponding transposition factor (TF). The model is then built using the geographical parameters and input GHI values. The output essentially identifies spatial trends in GHI and uses them to produce GHI estimates pan India. This resulting GHI estimates can be transformed back to respective GTI using the inverse of TF.

Training dataset
As part of asset management, solar PV plants have irradiation sensors whose measures are sent to the servers on a real-time basis. This is incorporated into the analytics portal which is used for operations and monitoring. Thus, the data is organized for each plant with its geographical parameters (latitude and longitude) along with GTI measured by on-ground sensors. T-factors corresponding to each sensor orientation are also known which are used to obtain GHI values.
The above snippet is the sample training data showing ground sensor readings from the mentioned coordinates. The number of rows in the training set (number of sensors) varies depending on the month and year in question. This is because the source of data for this study is from the commissioned PV systems across India. The training data for August 2019 has more than 130 sensors across India.
Formulate a training dataset with the parameters {Latitude, Longitude, Monthly GHI, Month}.
 Use the 2-D spatial interpolation modules of Pykrige python library to perform gap-filling.
Refer to the official Pykrige documentation [7] to understand the various modules of the library.
 Define a reasonable range and resolution of the geographical grid to perform interpolation so that inherent trends can be conspicuous.
 The optimization problem is defined as minimizing total absolute error of estimation (J) with respect to the kriging variogram parameters.
 Output dataset is of the same format as that of input one except that the former has GHI estimates for all co-ordinates within the grid.
 The feedback from the error minimization block is performed until we obtain the minima of J.

Flowchart of data flow
The results shown henceforth for August 2019 have been obtained using 2-D Ordinary kriging in the geographical range corresponding to India (+5.6 to +37.4 latitudes and +67.8 to +97.6 longitudes) with a 0.1 grid resolution (10Km X 10Km).
The nomenclature below is defined to aid visual understanding in the results section and maintain uniformity.

Discussion
 The black dots on the heat maps denote the locations of the irradiance sensors. Their considerable spread across India can be observed. This helps in learning majority spatial GHI trends if not all.  The irradiation heat map for August 2019 is in line with actual weather data reported by Indian weather agencies. For instance, the trend of excessive rainfall reported in central India this august closely correlates that which is shown as considerably lower irradiation in the region.  The model optimizes its variogram parameters by achieving the minima of total absolute error (%) which is evident in the mentioned statistics of training error.  The standard dataset has been obtained from a reputed industrial vendor whose data points are a combination of long-term averaged values and satellite imagery. This is used as a reference to understand temporal GHI trends.  As shown above, reference GHI considerably deviates from ground reality. In the sample results for August 2019, there is negative mean deviation and considerable standard deviation in the error distribution. This verifies the increasingly severe trend of monsoon pan India this August and the consequent irradiation shortfall.  The major bottleneck in improving estimation accuracy is the quality and quantity of training data. Having a greater geographical spread of sensors and the ability to capture accurate measurements, the model's capability to predict GHI trends can be better appreciated.

Conclusion
As part of our study, the increasing predominance of solar PV as a renewable source of energy is discussed. This has focused the attention on the need to have quality irradiation data. The above research has been as an endeavour to use a data-driven approach to solve the issue at hand. Hopefully, this work can showcase the power of using data-intensive techniques such as the one above to solve the many challenges in the energy industry especially those in solar. The model is built using irradiation sensor data pan India and used an effective spatial interpolation technique, kriging, to produce the gap-filled estimates. The statistical measures of estimate error are also mentioned which show impressive accuracy. Heat maps for respective months have also been produced for better visualization of GHI trends. An independent standard dataset is also compared with the estimates to better understand the temporal GHI trends with respect to long-term averaged values. The assessment of this work's potential is for the industrial community to ascertain as this can have various use cases of immense business value.