Drive-by Air Pollution Sensing Systems: Challenges and Future Directions

Air pollution has become a significant health, environmental, and economic problem worldwide. The conventional approach of deploying fixed high-end air quality monitoring stations provides accurate measurements but can be expensive to deploy and maintain. As a result, the stations are typically deployed in a few strategic locations with various spatial interpolation or prediction models to estimate the air quality values from unsampled points. Recently, drive-by air quality sensing has emerged as a popular approach due to its dynamic nature, high spatial coverage, and low operational costs while providing high-resolution data. At the same time, drive-by sensing (DS) has introduced a range of novel research challenges in terms of spatial and temporal coverage, mobile sensor (MS) calibration, and deployment strategies. This article provides a systematic review and analysis of the recent work in this area, focusing on vehicular platforms, deployment strategies, primary challenges, and promising research directions.


I. INTRODUCTION
A IR pollution has emerged as a global concern due to the rapid increase in urbanization and industrialization, causing severe health issues such as respiratory disorders and cardiovascular diseases and can increase the mortality risk [1], [2], [3].As air pollution sources, such as emissions from burning fossil fuels for transportation, power generation, and heating, are usually spread across extensive geographical areas, the conventional monitoring approach involves deploying multiple fixed air quality monitoring stations throughout the urban area.However, due to high equipment and maintenance costs, the monitoring stations are typically deployed in limited quantities, and consequently, spatial interpolation models are employed to approximate air quality values at unsampled locations [4], [5].
Drive-by sensing (DS) has emerged as a popular approach for air quality monitoring due to its dynamic nature, extensive spatial coverage, and reduced operational costs while providing high-resolution data [6], [7].Different vehicles equipped with low-cost sensors (LCSs) have been proposed as mobile platforms for air quality monitoring.
Messier et al. [8] used data from sensor-equipped Google Street View cars for mapping air quality in the Greater London area.Biondi et al. [9] used sensors deployed on buses in Catalina, Italy, to acquire air quality data providing a high-resolution air quality map.Gómez-Suárez et al. [10] mounted a low-cost device with optical and electrochemical (EC) sensors on bicycles to monitor air quality in urban environments.At the same time, vehicular-based sensing has introduced a range of novel research challenges in terms of sensor deployment [11], [12], spatio-temporal coverage [13], data collection strategies [14], [15], [16], calibration models [10], [17], and data analysis [18], [19].For example, the predictable nature of bus routes and schedules presents new opportunities that could be exploited for optimizing spatial coverage.
Hence, strategies have been proposed to maximize spatial coverage with a limited number of sensors [20].Similarly, calibration models can be adapted to the mobility and specifics of public transport due to certain public transit types having predefined and overlapping routes.Despite recent advances, the systematic review of DS for air pollution monitoring is not well documented.Ji et al. [21] summarized recent work on DS systems discussing sensor deployment and assessing the sensing power of various fleets.However, the work focusses on optimization perspective without addressing the specific aspects of air pollution monitoring.In this work, we provide a systematic review of DS systems concentrating on air quality monitoring using Internet of Things (IoT).The survey starts with review of air quality standards, gas sensor characteristics, and continues with categorization of existing work discussing open challenges and potential research directions.The survey also categorizes the recent work by vehicular platform, such as public transportation, taxis, bicycles, and unmanned aerial vehicle (UAV), discussing relevant deployment projects.
Our methodology consisted of a systematic search and analysis of literature pertaining to low-cost air quality sensors, DS, data management, and spatio-temporal coverage.The main databases used for gathering materials were IEEE Xplore, PubMed, Google Scholar, and ScienceDirect, etc.Our research also included conference proceedings, technical reports, and relevant book chapters.We shortlisted papers based on their relevance to sensor technology for air quality monitoring, calibration techniques, data management strategies, spatiotemporal coverage, and the inclusion of DS approaches.Each shortlisted paper was meticulously reviewed for its applied methodologies, applications, findings, and conclusions.From this, we compiled a comprehensive synthesis of the current advancements and trends in low-cost air quality sensor technology, DS platform, data management, and spatio-temporal coverage.The overarching aim of this methodology is to provide a thorough and focused review of the current state-ofthe-art in these areas.
The rest of the article is structured as follows.Section II presents various air pollutant types and provide their description in table form.Furthermore, we describe the various low-cost air pollutant sensing technologies and the challenges facing wireless sensor accuracy and mobile sensor (MS) calibration.Section III discusses the benefits of employing vehicular platforms as sensor nodes and their implementation protocols in addition, we describe their limitations and review prior related work.We will also describe deployment strategies literature to overcome the challenges in Sections II and III.Section IV discusses the impact of DS on data spatio-temporal resolution and their trade-offs.Section V describes strategies to process wireless sensors data and to overcome various issues and remove sensor error drifts.Section VI discusses strategies and the challenges facing sensory communications and storage in a dynamic sensing environment.Finally, Section VII summarizes the work giving our final thoughts.

II. AIR POLLUTANTS, STANDARDS, AND SENSORS A. Air Pollutants
Common air pollutants identified by researchers, namely oxides of sulfur (SOx), oxides of nitrogen (NOx), carbon monoxide (CO), carbon dioxide (CO 2 ), ozone (O 3 ), fine particulate matter (e.g., PM10 and PM2.5), and volatile organic compounds (VOCs) [22].These air pollutants are emitted from various sources and cause health related issues (see Table I).Artificial sources include emissions from transportation, industrial processes (e.g., factories, power generation, etc.), and land use, such as agriculture and urban development [23].Transportation-related air pollutants include NOx, CO, hydrocarbons, and fine particulate matter, which are produced by combustion and incomplete combustion of fuel in traffic engines [24].

B. Air Pollutants Concentration Standards
Air quality indices (AQIs) is a numerical index developed as an indicator of current air pollution levels, specifies the impact on public health, and provides cautionary statements [31].Governments and agencies have set limits on air pollutants to identify their risk factor.To illustrate these differences, we have provided air pollutant concentration limits for three different agencies (see Table II).The air pollution data are reported as averaging time in terms of hourly, annual, or peak season data, as shown in Table II.
As each agency or institution has different AQI values and levels, to illustrate these differences and provide an example of an AQI, Department for Environment Food and Rural Affairs, U.K. and U.S. Environmental Protection Agency have provided an AQI (DAQI) which informs the public of the air pollution levels and advice and recommendations (see Table III).Air pollutants concentration data can be used to calculate AQI levels using different functions.

C. Low-Cost Sensor Technologies
Low-cost sensor technologies have emerged as an effective solution for various monitoring and detection applications, providing critical data for diverse fields ranging from environmental monitoring to healthcare.Their popularity can be attributed to their cost-effectiveness, the ability to be deployed in large numbers due to their small size, and in many cases, their real-time data collection capabilities.These characteristics make them suitable for applications where broad coverage or dense sampling is necessary.However, these sensors also present certain challenges.The trade-off for their low cost can often be limitations in their performance compared to more expensive, professional-grade equipment.Issues can include lower accuracy, reduced reliability, less linearity, and slower response times.Furthermore, they can be susceptible to environmental conditions and may require frequent calibration.As a result, while LCSs offer promising possibilities for many applications, careful consideration must be given to their selection and deployment to ensure the quality and reliability of the data they provide.Below we provide a summary of key gas sensor types used for air pollution monitoring.
1) Electrochemical Sensors: Electrochemical (EC) gas sensors react with target gas producing a measurable potential difference between two electrodes, which is proportional to the gas concentration [36].They are simple and easy to manufacture, have sufficient sensitivity, require less power, and are less affected by environmental factors such as temperature and pressure [37].EC sensors have a short response time [37], which is defined as the time for a sensor to respond from the baseline signal to attain a certain percentage of its entire response after being exposed to the target gas [38].It is usually in some literature the percentage is 90% of its entire response, and recovery time is the time required, after the removal of the target gas, to restore to 90% of the original baseline signal [39].EC sensors also have sufficient selectivity [37], which is referred to as the ability of low-cost gas sensors to discriminate between the target gas from the interference gas molecules [40].Cross-sensitivity to other gases might occur in which electrical changes when sensing the target gas air pollutant can be similar to another [41].Aging has an impact on EC gas sensor sensitivity, causing signal drifts.
2) Semiconductor or Metal Oxide Sensors: Semiconductor or metal oxide (MO) gas sensors contain a surface layer of one or more MOs, a sensing chip, and a heater for heating the membrane; when the MO reacts with the target gas, the conductivity increases, which is then measured by the sensing chip [42].MO sensors' small size and low cost make them very suitable for portable and remote monitoring applications.Advantages include long-term stability and lifetime, and adequate sensitivity [43].Semiconductor gas sensors suffer from interference from other gases' composition in the surrounding atmosphere, temperature fluctuation, and humidity change [44].MO sensor conductivity response is nonlinear concerning the target air pollutant, which can be a challenge due to sensitivity toward changes in atmospheric temperature and humidity [45].The selectivity problem can be solved using various strategies.For example, physical and chemical gas filters delay or prevent the interfering gas from reaching the sensor's surface [46].Baseline drift is a critical issue of MO sensors, which impacts its long-term stability [47].Excessive heating temperatures also impact the sensors materials stability which can be resolved by implementing activation methods apart for using heating [48].
3) Nondispersive Infrared (NDIR) Sensors: Nondispersive infrared (NDIR) emits IR radiation, and based on the absorption characteristics; the target gas can be identified [49].NDIR sensor components include an IR source, a sample chamber or gas cell, an optical or light filter, and an IR detector.The advantages of NDIR sensors include robustness, high selectivity, and a long life span [50].This makes NDIR readings more accurate.Challenges include high detection limits, spectral interference, and exposure to moisture and nontarget gases can cause interference and reduce sensitivity [51].
4) Optical Particulate-Matter (PM) Sensors: Optical particulate-matter (PM) sensors use the light scattering method; the laser light is scattered by the particles in the sampled air, which is collected at a certain degree by a photodetector, which allows measurement of the particle's size and concentration [52].The sensor also includes a set of focusing lenses, and a fan is used, allowing air flow with particles through the chamber [53].Optical PM sensors are popular due to their low power consumption, low cost, and quick response [52].However, the sensor's accuracy can be affected by nontarget particles (creating noise), interference from ambient sources, reliability of the parts used, and factors affecting the airflow [54].Furthermore, low-cost PM sensors performance can be affected under conditions of high relative humidity [55].Low-cost particle sensors can overestimate the particle mass concentrations during high RH, as the size of hygroscopic particles is dependent on RH [56].We have come across various existing surveys/review papers on performance and capabilities of various LCSs (listed in Table IV) and have presented a comparative analysis of LCSs' attributes based on these review papers (see Table V) highlighting their various strengths highlighting their various strengths and issues of different LCS types.
The response time of sensors significantly impacts mobile air quality measurements, particularly in dynamic urban environments where air quality fluctuates rapidly.The response time of a sensor affects the temporal resolution of the measurements, therefore, providing more data points in a given period, allowing for a more detailed and accurate representation of changes over time.For example, if a sensor only updates its readings every few minutes, it might associate a high pollution level with a location the monitoring vehicle passed some time ago.They are also generally more sensitive and prone to noise and environmental interference like temperature, humidity, and pressure changes.These factors may lead to signal fluctuations and inaccurate readings.In addition, fastresponse sensors often consume more power, a limitation for mobile applications where battery life is critical, more sophisticated calibration processes may be required and therefore entail higher costs.Addressing these issues requires optimizing sensor designs, creating efficient calibration methodologies, and developing advanced data analysis techniques to filter out noise and correct environmental influences.

III. VEHICULAR PLATFORM A. Public Transit
Public transportation or transit is a mass transport system within the urban area and is used by the public, typically following scheduled routes and timings.Some public transport modes are available in a significant number covering the large urban area.We will describe three types of public transit modes city buses, taxis, and trains or trams.
1) City Bus: Buses have received considerable attention as mobile sensing platforms, for their availability in significant numbers, high-spatial coverage, and reliable operations [74].Buses have predefined routes and schedules that are typically available publicly, which makes their trajectory predictable [75].City buses repeat the same route multiple times throughout the day, providing high temporal resolution.As equipping an entire fleet of buses with sensors increases deployment and operations costs, significant work was dedicated to maximizing spatial or spatio-temporal coverage with limited number of vehicles for drive-by-sensing [7], [20], [76].Ali and Dyo [20] analyzed spatial coverage using real bus route dataset and proposed a greedy optimization approach for optimal route selection to increase sensing coverage in London, U.K. Caminha et al. [7] developed a similar optimization approach to analyze spatio-temporal coverage in Rio de Janeiro, Brazil.Later, Agarwa et al. [77] used greedy heuristic to optimize sensor deployment San Francisco and Rome bus datasets.Fan et al. [128] proposed to exploit spatial and temporal correlation of air quality data to select buses that sample diverse and representative set of locations, and evaluate the approach using Delhi, India general transit feed specification (GTFS) data.
Finally, Wang et al. [75] and Tonekaboni et al. [78] selected optimal bus set numbers by analyzing the historical trajectories to maximize spatial-temporal coverage.Bus routes have overlapping routes, which can result in redundant data collection.The overlapping nature of bus routes can be used for cross-checking individual bus readings and mutual sensor calibration [79].However, the work on the calibration of bus-mounted sensors is currently limited [80].The interesting problems include identifying the optimal locations for reference stations given real bus routes and schedules, evaluation of the calibration performance, as well as trade-offs between calibration and coverage.Bus transit's predefined routes and schedules make their mobility less flexible and sensing campaigns can only be done along the fixed routes.In cases, where flexibility is critical, other vehicular platforms, such as taxis or UAVs can be used, which are discussed in Section III-C3.
2) Trains and Trams: Trains and trams also have predefined routes and schedules with predictable trajectories and repeat their routes multiple times throughout the day, providing high temporal resolution.As trains have separate infrastructure, air pollution sensing is not affected by road traffic-related delays and other issues.The well-known OpenSense project used sensors on trams in Zurich and buses in Lausanne for monitoring air quality in real-time in Switzerland [81].
Trains and trams, by following fixed routes, can introduce a spatial bias in air quality monitoring.This bias arises because their routes are limited to specific areas along their fixed routes, potentially overlooking air quality variations in other regions.This spatial bias should be considered when interpreting the data obtained from such transportation-based monitoring systems.Deploying supplementary sensors on other vehicles along with sensors equipped with trams and trains can potentially solve the spatial bias issue.
3) Taxi: Taxi vehicles can cover large urban areas with high resolution given the sufficient fleet size.Apart from mobility, the advantage of using vehicles as sensing platforms is that sensors may have access to vehicular power systems and are therefore less constrained in energy and physical dimensions compared to MSs in other contexts [82].
The works that used taxis as mobile platforms include [83], [84], [85], [86].Yeom [84] deployed low-cost CO, NO 2 , NH 3 , O 3 , CH 4 , SO 2 , and PM sensors on taxi roof tops and sport utility vehicles (SUVs) for real-time air quality sensing and visualization of high spatial resolution in Daejeon, Korea.Wang et al. [83] leveraged LCSs and novel calibration algorithms, and deployed low-cost CO, NO 2 , and O 3 sensors on a taxi fleet in Nanjing, to examine urban air quality over a year to study the impact of COVID lockdowns.The authors used geographic information system (GIS) technology to create high-resolution (50 × 50 m) spatial distribution maps of major pollutants (CO, NO 2 , and O 3 ) to identify pollution sources and analyze traffic-related emission patterns.
Taxis create a spatial bias as they tend to concentrate around areas with high people activity (e.g., shopping areas, airports, etc.), and their behavior is partially irregular [85].In addition, taxi mobility also depends on the road-traffic situations, taxi drivers' routing decisions who normally opt for quicker routes, and the client's routing requests making taxi trajectories random and unpredictable [86].This random mobility leaves some parts of an urban region un-sampled or less-sampled, creating sparse data collection and data coverage time-variation problems [87].
Taxis' random mobility issue is a considerable challenge in relation to spatial and temporal coverage.Chen et al. [87] designed an adaptive hybrid model-enabled sensing system (HMSS) to achieve optimal sensing coverage quality and fine-grained air pollution estimation to address the challenge of sparse and time-varying data coverage.Around 53.5 million data samples were collected during a period of 14 days from 47 sensor devices on taxis and fixed locations were used for system performance assessment.They were deployed in two cities to conduct both controlled and uncontrolled tests.An alternative approach to overcome the taxi's mobility issues is to equip sensors onboard taxis and supplementary vehicles.

B. Municipal Transport
Municipal transport is typically used for maintenance purposes, such as dump trucks for solid waste transportation, vans for deliveries, ambulance patient transport service, etc. Spatial coverage depends mostly on the type of public service vehicle; for example, police patrol vehicles provide good coverage, while emergency vehicles such as fire trucks and ambulances do not, as they are only operational in emergency situations.Delivery vans cover most of the commercial and residential areas and their routes.Public service vehicles have biased behavior as their routes are not predefined and depend on the driver's decision.Their operations are not continuous, as when their services are completed, their operations are also suspended.Their low number provides low spatial coverage and leaves large gaps in sensing data.
Qin et al. [88] proposed a model for fine-grained urban air quality mapping from sparse NO 2 measurements.The model was validated on data collected from LCSs equipped on 17 postal vans in Antwerp, Belgium.Their sampling routes were relatively random, and the sampling campaign is generally conducted from 6:00 to 23:00 on weekdays and Saturdays.During the daytime, the sampling intervals were 10 s, and at night-time, 10 min when the vans were parked.

C. Private Transport
Private transportation, as opposed to public transport, refers to the type of transportation for personal or individual, such as cars, motorbikes, and bicycles.There are several types of private vehicles or conveyances to choose from for air quality sensor placement.
1) Dedicated Vehicles: Dedicated private vehicles are vehicles that are modified for air pollution measurements.For example, two Google Street View cars have been equipped with fast-response research-grade instruments to monitor nitric oxide (NO) and black carbon in Oakland, CA, USA, [8], [89], [90].The system was also used to collect NO 2 , NOx, and CO 2 at 1 s interval in London, U.K. between September 2018, and October 2019 as part of Breathe London Project.Chiesa et al. [91] deployed LCS an optical particulate matter sensor on the laboratory van through the cable hole on the roof of the vehicle for monitoring urban air pollutants.The system has been used in the experimental field campaign to measure air quality in Rome, Italy.Such vehicles may follow specially computed routes that are optimized for air pollution measurements.
2) Personal Vehicles: Personal vehicles such as cars, SUVs, bicycles, etc. have flexible mobility offering high spatio-temporal resolution and relatively low deployment and operation costs.As personal vehicles are used and owned by private individuals, the routes and schedules are dependent on the owner's behavior, which can create spatial bias.HazeWatch [92] one of the first projects using personal vehicles, used cars equipped with low-cost CO, NO 2 , and O 3 gas sensors to measure air quality in Sydney.Wesseling et al. [93] used measurements from 500 sensors mounted on bicycles in Utrecht, The Netherlands to estimate the PM2.5 levels that the cyclists are typically exposed to.Gómez-Suárez et al. [10] mounted low-cost NO 2 , O 3 , and particulate matter sensors on eight bicycles for monitoring air quality in the city of Badajoz, Spain over several days.
3) Unmanned Aerial Vehicles (UAVs): Compared to ground vehicles, which are constrained by the road topology and are affected by road traffic, UAVs can travel to their destination directly or take any arbitrary paths required for data collection.UAVs can be used for measuring air pollution in areas that are inaccessible by ground vehicles, such as landfill sites [94].UAVs can also be used for high-altitude air quality sensing and areas or locations that are hazardous or dangerous to humans.
Measurements from sensors deployed on UAVs or drones can be affected by the wind generated from the rotors [95].This problem has been addressed in work [96] by analyzing the structure of a UAV.Arroyo et al. [97] developed an electronic system for air quality monitoring integrated in an unmanned air vehicle.The sampling system is designed for avoiding interference of motors and downwash.Field calibration was done by certified reference equipment, and measurements in both static and movement have been done.Results concluded that the monitoring device is not affected at any time by the movement of the drone, the downwash effect, or electromagnetic disturbances.Finally, the communication range of the UAV and the controller presents another challenge.Once out of range the controller will lose control.This can potentially be addressed using satellite communication links.

IV. SPATIO-TEMPORAL RESOLUTION
Deploying drive-by air pollution sensing systems presents a trade-off between spatial and temporal resolution.While stationary sensors provide high temporal resolution with continuous, real-time pollutant measurements, their fixed location limits spatial resolution.Obtaining a balance between temporal data from stationary sensors and the need for comprehensive spatial coverage can be challenging in air quality monitoring.Spatial interpolation techniques can help address this issue by using mathematical models to estimate pollutant concentrations in areas between sensors, effectively increasing perceived spatial resolution.For example, Joseph et al. [98] applied various simple spatial interpolation techniques for 8-hourly ozone using data from monitoring stations in two urban area.Tong et al. [99] evaluated different Kriging interpolation methods' performances based on the AQI in Wuhan.Some recent studies have adopted machine learning and neural networks as an effective alternative to traditional mathematical and statistical models.For examples, Wahid et al. [100] adopted the radial basis function network metamodeling to estimate the spatial distribution of ozone concentrations in the Sydney basin, Australia.Pfeiffer et al. [101] developed a new approach utilizing diffusive sampling measurements and artificial neural network (ANN) evaluation for evaluating the average spatial distribution of air pollutant levels.Environment factors also need to be considered for accurate measurement estimation such as Nicoletta et al. [102] considering the meteorological parameters such as effect of wind direction and intensity.Korunoski et al. [103] presented an IoT based system which also considered meteorological parameters in their model utilizing spatial interpolation technique for intelligent air pollution prediction and visualization.
On the other hand, DS systems offer high spatial resolution due to their mobility, enabling comprehensive mapping of pollutant distribution across various locations.They are valuable for identifying pollution hotspots and spatial trends.Alvear et al. [104] propose an architecture for high-resolution air pollutant monitoring using LCSs mounted on a bike and used it to monitor University campus.The authors used spatial interpolation using kriging to obtain more detailed distribution.Wang et al. [75] deployed LCSs on taxis fleet providing urban air quality at high spatial resolution of Nanjing city.However, their mobile nature limits their temporal resolution, making continuous monitoring at a single location challenging.One way to address this challenge is by increasing the number of MSs, which can help to improve temporal resolution by ensuring more frequent coverage of each location [93].However, this will increase the cost of deployment and operations.Another way is by combining MS data with data from stationary sensors providing high temporal resolution from the stationary sensors (reference-grade, low-cost) and high spatial resolution from the MSs [105].
More recently, Wang et al. [83] created high-resolution (50 × 50 m) spatial distribution maps of major pollutants (CO, NO 2 , and O 3 ), identified pollution sources, and analyzed traffic-related emission patterns using data collected from taxi fleet in Nanjing, China, over a year.It finds significant variations in pollutant levels across different types of roads, with COVID lockdowns substantially impacting these levels.Yeom [84] used LCSs on taxi roof tops and SUVs for real-time air quality sensing and visualization across large cities with high spatial resolution.

V. DATA PROCESSING AND CALIBRATION A. Data Processing
Processing data from an LCS network involves a series of steps to clean, organize, and analyze the data generated by the sensors.We will be focusing on the cleaning and organizing of raw data, which is essential for data analysis.One of the challenges is estimation of missing values and different techniques have been developed to effectively handle this problem of.Al-Janabi and Alkaim [106] developed a novel tool random forest and local least squares (DRFLLS) to estimate missing values of various datasets.Wardana et al. [107] exploit both temporal and spatial data and developed an autoencoder model for estimating missing values in air quality data.Yen et al. [108] presented work in which they analyzed the performance of existing interpolation methods, including conventional and deep learning models.The authors used linear regression, support vector regression, ANNs, and long short-term memory to make time-series predictions for missing values.Interpolation techniques can be used to fill in missing values in the air quality dataset, however, this will create an inaccurate dataset impacting the quality of data [109].
Data from malfunctioning sensors might need to be discarded.Outliers, which could represent unusual environmental events or sensor errors, might need to be investigated and handled appropriately.Mendez and Labrador [110] proposed a new hybrid algorithm for spatial outlier detection and removal.The model considers aspects that could be found in real participatory sensing (PS) systems, such as the uneven spatial density of the users, malicious users, and the lack of accuracy and malfunctioning sensors.This can also reduce unnecessary data and save storage space.Another challenge is that the sensor network may include different types of sensors or sensors from different manufacturers, the data may need to be integrated into a common format for analysis.This could involve adjusting for differences in measurement units, resolution, or data formats which is a challenge.This problem can be resolved using similar sensors type and from the same manufacturers.However, using sensor datasets from a secondary source may not have a common format therefore will require data integration for analysis.
Another issue is dealing with redundant data, which can include duplicate data and data that exist in different formats.Redundant data consume unnecessary storage space [111].This is particularly problematic when dealing with highvolume, high-velocity data produced by a network of air pollution sensors.

B. Calibration of Mobile Sensors
Errors from internal sources, such as temporal drift, and external sources, such as changes in environmental conditions (e.g., temperature and humidity), present a major challenge for LCSs [112].Therefore, calibration is required, which is the process of identifying and correcting systematic bias in sensor readings [113].The dynamic nature of vehicular-based air quality sensing makes it difficult to calibrate in laboratories as it will suspend operations.Common practice for field calibration MS is single-hop calibration, where an MS is placed near a stationary reference grade sensor to compare and calibrate its sensor readings (see Fig. 1).This proximity, called rendezvous, requires two or more sensors in the same spatial and temporal vicinity to measure the same phenomena [114].This simple approach will only calibrate sensors that pass through the reference station location and may require a large number of reference stations to calibrate an entire fleet.
Multihop calibration can calibrate sensors in multiple hops [115] as illustrated in Fig. 1.This allows for calibration of an entire fleet using a lower number of reference stations, consequently reducing deployment and operational costs.Sensor error accumulation over multiple hops in large-scale MS networks is a challenge.Ensuring rendezvous under urban traffic situations can also be a challenging task as the freshly calibrated MS may not be able to reach the un-calibrated MS.Leaving a gap in calibrated MS network increases uncertainty.
Ensuring communication between two MS nodes at a rendezvous point requires adequate communication technology providing low-latency data transfer.Wi-Fi communication protocol can be suitable however signal interference in an area with dense Wi-Fi networks.Alternatively, the calibration can be done postfacto in a centralized manner, after the data from all sensors have been collected.

VI. DATA COMMUNICATION AND STORAGE A. Data Transmission
An efficient data transmission method is essential especially when the application or system requires real-time monitoring.Several communications protocols are used for transferring air quality levels data for analysis or communicate between sensors.Typical communication technologies include cellular network services [16], [116], Wi-Fi [117], [118], [119], and Bluetooth [92], [120].When selecting the appropriate communication technology for sensor data transfer there are a few aspects that need to be addressed, such as geographical coverage, cost, energy consumption, data transfer speed, and latency.
There are different types of cellular networks (2G, 3G, 4G, 5G, and LTE) in terms of technology, speed, bandwidth, latency, capacity, scalability, and applications.As the technology enhances, their capabilities such as data transmission speed and capacity also improve.Modern cellular networks provide large geographical coverage, scalability, and high throughput, making it a suitable choice for DS applications.
However, cellular network services have higher operational costs and energy consumption compared to Wi-Fi, Bluetooth, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and LPWAN systems.Nguyen et al. [16] propose an offloading protocol to reduce 4G costs while maintaining data latency by investigating an opportunistic communication model in which air quality data are transferred via a 4G network or Wi-Fi to adjacent devices deployed along the road.
LoRa is increasingly in demand due to its long range, low cost, and low power consumption which makes it suitable for IoT-based systems and smart city applications [119].Pal et al. [117] designed an IoT-based air quality monitoring system using UAVs and LoRa technology for sending sensory data.Twahirwa et al. [121] also developed and deployed LoRa enabled IoT framework for air pollution monitoring.However, cross-interference by coexisting technologies [Wi-Fi, Bluetooth, and bluetooth low energy (BLE)] working in the same frequency band and having overlapping channels can limit its performance [122].The performance of LoRa can be affected by the speed of the mobile platform.Highspeed mobile platforms may cause Doppler shifts that could affect data transmission [123].Finally, LoRa has a relatively low data rate and small packet size compared to some wireless technologies [124], [125].If sensors generate large amounts of data, this could exceed LoRa's capacity, affecting data quality.This could be solved using adaptive data rate schemes, which may require different computational complexities [126].
With a dynamic trajectory of mobile platforms, the network topology continuously changes, causing communication links to disconnect [127].Deployment of vehicle-to-vehicle (V2V), vehicle-to-network (V2N), and vehicle-to-infrastructure (V2I) architecture can improve vehicular-based sensor networks.For instance, several works [128], [129], [130] have developed approaches based on V2V and V2I communications.These platforms can be used as a wireless multihop network in which sensory data can be transferred from V2V to V2I or V2N (see Fig. 2 for V2V, V2N, and V2I communications).This can potentially reduce cost and power consumption while providing low data latency.Finally, advances in satellite technology have given rise to satellite driven applications in areas such as broadcasting, telecommunication, internet service providers, etc., and could potentially be used for drive-by air quality sensing.In addition, satellite communication remains active even when land-based communications are down or unreachable.This could be useful, for example, when streaming data from a UAV.

B. Data Storage
There are different strategies to save sensed data, typically, the wireless LCS communicates with the IoT cloud for data storage, processing, and visualization.For example, Kaivonen and Ngai [131] measured real-time air pollution levels by utilizing IoT and deploying wireless LCS on public transport in the city of Uppsala.Nguyen et al. [16] proposed real-time air quality data offloading scheme which leverages Q-learning, where data are transmitted to a cloud server.Fioccola et al. [132] utilized a cloud-based platform for managing air quality data which also includes data storage.Using offsite servers or cloud servers have large spaces for big data storage, and cloud platforms also provide processing services.Storage devices also have sufficient storage space for datasets and can also have reasonably low costs, which can also be deployed onboard mobile platforms.For example, Zhuang et al. [133] introduced a portable personal air quality monitoring device in the Airsense project.The battery-powered sensor node includes several sensors and a GPS module, all integrated on a single printed circuit board (PCB) with a micro-controller.The device does not use wireless communication technology; instead, it records the measurements on an accessible SD card for the user.
Adaptive sampling, data compression, and data aggregation can be used to assist in reducing the amount of air quality data.Zeng and Xiang [134] proposed an adaptive sampling scheme for urban air quality to save energy and memory space by turning on sampling module during sensing different events.Ghose and Rehena [135] proposed a lossless data compression algorithm to reduce air quality data.Khedo et al. [136] developed a novel data aggregation algorithm to significantly reduce the amount of air pollution data.The algorithm is used to merge data to eliminate duplicates, filter out invalid readings, and summarize them into a simpler form.Reducing the amount of data can save storage space, however, this could make the data computational process more complex.
Blockchain data storage can be used to deliver higher levels of security, reliability, durability, and transparency.In a blockchain network, data are replicated and distributed across multiple nodes in the network.In the context of air quality monitoring, blockchain-based solutions have been proposed to prevent forgery and tampering of sensor data.However, using blockchain for sensor data requires overcoming a problem with overhead and data duplication that lead to storage, performance, and scalability problems [137].Using blockchain and compression schemes, however, can increase processing requirements which may include higher power consumption, processing-time, and other costs.

VII. CONCLUSION AND FUTURE WORK
Vehicular-based air pollution monitoring has gained significant attention over recent years.Due to mobility, a relatively small number of sensor devices can monitor air pollution over vast geographical urban areas.However, several challenges need to be resolved in terms of deployment strategies, calibration, communication, and other issues.In this study, we provide a systematic analysis and taxonomy of recent work on this topic.We first presented a summary of major air pollutants, gas sensor characteristics, and relevant important air pollution standards.We then analyzed the relevant work on mobile air pollution monitoring categorized by major urban transport modalities, such as buses, taxis, and utility vehicles.We highlighted the benefits and limitations of each transport mode and the challenges and lessons learned in those projects.This is followed by a review of relevant work on calibration and data communication for vehicular-based air monitoring.
We highlight key open problems, related to the calibration of mobile LCSs, data communication, vehicle selection for maximizing spatio-temporal coverage, and robustness of route planning to traffic congestion, Fig. 3.
However, the review is not exhaustive, and such issues as security, privacy, datasets, and energy consumption are not included in this study.We hope that the review will help in future research toward more robust, accurate, and secure vehicular-based air pollution monitoring systems.

Fig. 3 .
Fig. 3. Taxonomy for challenges when deploying DS air quality systems.

TABLE V ILLUSTRATION
OF PROS AND CONS OF VARIOUS LCSS