loading page

  • Nwamaka Okafor ,
  • Declan Delaney
Nwamaka Okafor
University College Dublin, University College Dublin

Corresponding Author:[email protected]

Author Profile
Declan Delaney
Author Profile


IoT sensors are gaining more popularity in the environmental monitoring space due to their relatively small size, cost of acquisition and ease of installation and operation. They are becoming increasingly important
supplement to traditional monitoring systems, particularly for in-situ based monitoring. However, data collection based on IoT sensors are often plagued with missing values usually occurring as a result of sensor faults, network failures, drifts and other operational issues. Several imputation strategies have been proposed for handling missing values in various application domains. This paper examines the performance of different imputation techniques including Multiple Imputation by Chain Equations (MICE), Random forest based imputation (missForest) and K-Nearest Neighbour (KNN) for handling missing values on sensor networks deployed for the quantification of Green House Gases(GHGs). Two tasks were conducted: first, Ozone (O3) and NO2/O3 concentration data collected using Aeroqual and Cairclip sensors respectively over a six months data collection period were corrupted by removing data intervals at different missing periods (p) where p 2 f1day; 1week; 2weeks; 1monthg and also at random points on the dataset at varying proportion (r) where r 2 f5%; 10%; 30%; 50%; 70%g. The missing data were then filled using the different imputation strategies and their imputation accuracy calculated. Second, the performance of sensor calibration by different regression models including Multi Linear Regression (MLR), Decision Tree (DT), Random Forest (RF) and XGBoost (XGB) trained on the different imputed datasets were evaluated. The analysis showed the MICE technique to outperform the others in imputing the missing values on both the O3 and NO2/O3 datasets when missingness was introduced over periods p. MissForest, however, outperformed the rest when missingness was introduced as randomly occuring point errors. While the analysis demonstrated the effects of missing and imputed data on sensor calibration, experimental results showed that a simple model on the imputed dataset can achieve state of-the-art result on in-situ sensor calibration, improving the data quality of the sensor.
15 Oct 2021Published in IEEE Sensors Journal volume 21 issue 20 on pages 22833-22845. 10.1109/JSEN.2021.3105442