A Spatial Feature Engineering Algorithm for Creating Air Pollution Health Datasets
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
Abstract: Air pollution is one of the significant causes of mortality and morbidity every year. In recent years, many researchers have focused their attention on the associations of air pollution and health. These studies used two types of data in their studies, i.e., air pollution data and health data. Feature engineering is used to create and optimize air quality and health features. In order to merge these datasets residential address, community/county/block/city and hospital/school address are used. Using residence address or any location becomes a spatial problem when the Air Quality Monitoring (AQM) stations are concentrated in urban areas within the regions and an overlap in the AQM stations in urban areas coverage area, which raises the question that how to associate the patients with the relevant AQM station. Also, in most of the studies the distance of patients to the AQM stations is also not taken into account. In this study, we propose a four-part spatial feature engineering algorithm to find the coordinates for health data, calculate distances with AQM stations and associate health records to the nearest AQM station. Hence, removing the limitations of current air pollution health datasets. The proposed algorithm is applied as a case study in Klang Valley, Malaysia. The results show that the proposed algorithm can generate air pollution health dataset efficiently and the algorithm also provides the radius facility to exclude the patients who are situated far away from the stations.