Data-Driven Single Image Deraining: A Comprehensive Review and New Perspectives

,


Introduction
Single image deraining (SID) or single image rain removal has been emerging as an important task in different areas of image processing and computer vision, which has a wide range of applications relating to single image processing and restoration.However, the quality of the images captured on rainy days by the outdoor vision system, such as autonomous driving, person/vehicle tracking and surveillance, are usually degraded by the rain streak, raindrop or rain accumulation (see Fig. 1).This will directly degrade the subsequent high-level vision tasks, e.g., object detection [68,89], image recognition [97,163] and saliency detection [32,132].In addition, due to the irregular and complex rain information in practice and the ill-posed properties, the task of SID that aims to estimate the rain-free background from degraded image is still a challenging and unmanageable issue to date.Unlike the video deraining that can leverage temporal redundancy and dynamics of rain, SID mainly exploits the spatial information of neighboring pixels and the visual properties of rain and background scenes.As such, SID tasks usually confront with more dif-ORCID(s): ficulties than the video deraining task [4,50,69,70].
To study the SID task, existing methods can be generally divided into two basic categories, i.e., traditional methods (such as filter-based, prior-based and model-based ones) [9,12,16,66,115,130,131] and deep learning-based datadriven methods [23,42,60,64,84,88,113,121,136,153,154].The filter-based methods aim to filter the rain image to obtain the rain-removed image [130,131].Classical methods for prior-and model-based types include sparse representation [16,115], low-rank representation [9,12] and gaussian mixture model [66], etc. Sparse coding finds the sparsest representation of the input in the form of a linear combination of basic elements (or dictionary atoms) as well as the basic elements themselves.Low-rank representation is a rank minimization problem, in which the cost function measures the fit between the given data matrix and approximating matrix (i.e., optimization variable), subject to a constraint that the approximating matrix has a reduced rank.
With the rapid development of deep learning [39,44,97], data-driven deraining methods have obtained impressive performance improvement over those traditional deraining methods.In this field, Rain1400 [23] and Rain100H/ Rain100L [136] datasets were first proposed in 2017 for the task of data-driven SID.After that, more and more synthetic SID datasets have been constructed, and meanwhile, the study on data-driven SID enters into a new period.In addition to synthesizing rain image data, different kinds of side information have been developed based on the original data, such as rain streak, rain mask, rain density, and image depth.By designing specific SID methods [42,60,84,136,153] that can take advantage of these side information, state-ofthe-art deraining performance has been obtained.It is noteworthy that, there are three important influencing factors that have to be discussed for the problem of SID, i.e., Data, Rain model and Network architecture: • Data.Data are the core factor of data-driven SID methods.Massive training data can usually enable the deraining networks to obtain better performance.However, due to the lack of image pairs in the real world, most current SID methods use synthetic data for both training and testing.
• Rain model.The rain model mainly explores how to describe and model the rain mathematically, and how to synthesize the rain image.Rain is a complex optical phenomenon and the rain removal problem is usually ill-posed, so the rain model indicates how we view the phenomenon and how we wish to solve the problem.
• Network architecture.Thr network structure is the most changeful and charming factor in the SID problem.Current studies have designed lots of new modules and network architectures of different properties to improve the deraining performance.
In recent studies, the performance of SID is getting better, however, this is usually in the cases of processing synthetic image data rather than real-world data.Specifically, the generalization ability of current SID methods is still limited in real-world scenarios.Furthermore, the rain removal task still cannot effectively improve the subsequent highlevel vision tasks (such as object recognition and detection [5,89]).To this end, several researchers have conducted experiments to enable the SID methods to have better application potential.For example, a large-scale benchmark dataset MPID [61], which contains both synthetic and real rain images with various rain types, was proposed to evaluate the performance of existing SID methods.The experimental evaluation and result analysis reveal the performance gap between synthetic and real-world data to some extent.
It is noteworthy that current studies and analyses on SID mainly focus on measuring the effectiveness of SID methods [62,107,109,137], and discussing the three influencing factors independently without clearly discussing their relationships and the mutual influence on the deraining task.In this paper, we therefore re-define the classification of the three factors, figure out their intrinsic relations, and moreover reveal different solving paradigms to address the SID task.Furthermore, we mainly examine the effectiveness of SID data based on new evaluation criteria.Overall, the main contributions of this paper are summarized as follows: 1. New divisions on the three factors: We provide new, more reasonable and easily understandable divisions on data (general vs. specific), rain model (synthetical vs. mathematical) and network architecture (blackbox vs. white-box) through re-examining and rethinking the three influencing factors of the SID task.2. In-depth analysis: We figure out the relations among the three factors and reveal two different solving paradigms (explicit vs. implicit).Besides, we also analyze and categorize the current 97 mainstream SID methods according to different properties and perspectives (i.e., training strategy, network pipeline, domain knowledge, data preprocessing and objective function).3. Novel experimental design: We design and prepare new forms of experimental configurations to evaluate the effectiveness of SID data by providing novel quantitative evaluation criteria.To the best of our knowledge, this is the first work to evaluate and quantify the existing mainstream SID datasets.4. Instructive conclusion: Based on the in-depth analysis and evaluations, we derive some practical and instructive conclusions on how to determine and choose data appropriately, which can help the researchers who are confused about the choice of SID data.
The rest of this paper is organized as follows.In Section 2, we introduce the background of single image deraining and related survey papers on rain removal task and describe the difference to our work.Section 3 illustrates the three factors, i.e., data, rain model and network architecture from new divisions.Section 4 explores the relationship between the three factors and discusses the solving paradigms of SID methods.In Section 5, we summarize the recent related data-driven SID methods from different aspects.In Section 6, we design novel forms of experiments to rank the public paired datasets and provide a detailed analysis.Finally, Section 7 provides some valuable conclusions and discusses some future directions.

Single Image Deraining
Single image deraining technology has been developed rapidly in recent years, and existing SID methods can be roughly divided into two categories: traditional model optimization-based methods (i.e., non-deep learning methods) and deep learning-based data-driven methods.Model optimization-based SID methods include sparse coding [26,75],  [9,12] and gaussian mixture model [66].According to different training modes, data-driven methods can be further divided into fully-supervised deep methods [42,110,136], unsupervised deep methods [53,158] and semi-supervised deep methods [121,124].Since 2017, the study on SID has entered the deep learning period and has achieved significant performance improvements.Model optimization-based SID methods [9,12,26,66,75] usually cannot accurately separate the rain layer from background, which has been gradually replaced by deep SID methods [23,88,136] with stronger representation capabilities.Compared to the traditional methods, the main reasons for the performance improvement of deep SID methods can be attributed to: (1) deep neural networks have powerful feature mapping capability; (2) a large amount of training data provides sufficient information for deep networks.Note that the deep SID method can use different deep neural networks to extract hierarchical features and depth information of rain streaks and obtain a direct mapping from the rain map to the clear image, such as convolutional neural network (CNN) [97], recurrent neural network (RNN) [18] and generative adversarial network (GAN) [33], etc.To train a better depth model, prior knowledge related to the rain layer or background layer can also be added to the network, such as rain mask [136], rain density [153] and scene depth [42], etc. Different basic modules or basic units can be designed for feature extraction, such as residual dense block [64], dual path hybrid block [126] and spatial attentive module [113], etc.In addition, different deep network architectures have been proposed, such as the recurrent framework [136] and the recursive framework [88], etc.

Related Survey Papers
So far, there are four most relevant survey papers to our work in recent years, i.e., [62,107,109,137].These works discuss and analyze the SID methods from different perspectives, and we summarize the four closely-related survey papers and our survey paper in Table 1.From the comparison, we can conclude that: 1) Wang et al. [107] (SCIS '22) simultaneously reviewed the video deraining and SID.Specifically, for the parts on SID, they mainly divide the existing methods into three categories according to the ways of problem-solving, i.e., filterbased, prior-based, and deep learning-based data-driven methods.However, data-driven SID methods only occupy a relatively small proportion.Specifically, the analysis part does not leave much space to introduce the data-driven methods, and the experimental part only evaluates seven data-driven methods, which are only trained on three datasets and tested on five datasets.Based on the evaluation results, they summarized the deficiencies of current deraining methods, and also presented some remarks to illume some meaningful future research directions along this line.
2) Li et al. [62] (IJCV'21) presented a comprehensive study and evaluation of 6 existing SID algorithms and introduced a new large-scale benchmark dataset (MPID) that contained both synthetic and real-world rain images of various rain types.To be specific, the MPID dataset contains three types of synthetic rain models (rain streak, raindrop, and rain mist), as well as a rich variety of evaluation criteria (two full-and three no-reference objective evaluations, subjective evaluation, and task-specific evaluation).The evaluation and analysis can indicate the performance gap between the synthetic and real-world rain images to some extent.
3) Wang et al. [109] (IJMLC'20) provided an inexhaustive review of the current SID techniques and mainly categorized them into three classes, that is, early filter-based, conventional prior-based, and recent deep learning-based datadriven approaches.Furthermore, inspired by the rationality of the current deep learning-based method PReNet [88] and insightful characteristics underlying rain shapes, they also build a specific coarse-to-fine deraining network architecture, which can finely deliver rain structures and progres-sively remove the rain streaks from the input images.
4) Yang et al. [137] (TPAMI'20) mainly focus on discussing the SID problem.Specifically, they divided the existing SID methods into model-based and data-driven methods, and further divided the deep learning-based data-driven methods into four sub-categories, i.e., deep convolutional neural networks (CNN-based), generative adversarial networks (GAN-based), semi-/un-supervised learning-based and benchmarks.Subsequently, they described the related methods in the four sub-categories in detail, including the network architectures, basic blocks, loss functions, and datasets.To evaluate the performance of each method, they selected eight deep learning-based data-driven methods and evaluated them using three synthetic datasets.However, this work did not provide detailed evaluations but only a performance graph.In addition, they also proposed some future directions, including the integration of physics models and real images, rain modeling, evaluation methodology, and more related tasks and real-world applications.
Remarks.After a brief introduction to the four related survey papers, we will present the difference between current works and ours.Wang et al. [109] propose a new SID method and leaves less space for reviewing current methods, so we will not present the details with this work here.We will mainly compare our work to the other three survey papers [62,107,137].Firstly, the experiments in [62] focused on fairly evaluating the performance of SID methods using a new benchmark dataset.It reveals the shortcomings of current rain removal issue, including the high complexity of the rain removal task, lack of appropriate evaluation metrics, poor generalization on real images, and little help for upstream tasks.In a traditional sense, [109] is not a real survey paper, since it is ground-breaking work, but it only reveals some new features of the SID task.Inspired by [62], we have designed more experiments to reveal the two key issues mentioned above, to which less attention has been paid.From the contents of [107,137], it is clear that they both lack investigation and analysis of the relationship between data and network structures.Particularly, their conducted experiments cannot deduce the following several issues: (1) superiority of different data, since appropriate data during the training phase can enhance the representation and generalization abilities of the networks, which can serve better for the real scenario and practical applications; (2) pros and cons of different solving paradigms, while we investigate them carefully via experiments on side information, different operations, and explicit or implicit paradigms.Specifically, different from [107,137] that also focus on the video deraining and traditional SID methods, we mainly focus on the data-driven SID task, where the related methods are categorized from the perspective of data into general and specific ones based on the relationship between data and networks.Note that the conclusions of the two categories are derived from a large number of experimental verifications, and some discussions and suggestions are also provided on this basis.We believe this survey paper will be able to provide a more insightful view for understanding the data-driven SID methods from a new perspective of data.
One of the most important purposes of exploring the SID task is that it can be potentially used in various applications of realistic scenarios, so the deraining results on real rain images (without ground truth) will determine the actual rain removal ability of each method.However, most existing data-driven SID methods are supervised, which needs all paired data for training, so they cannot be trained directly based on real images.Specifically, supervised methods are usually trained on synthetic datasets and tested on real images, i.e., the test result on real images represents the generalization ability of each method.Note that this result will be directly determined by the properties of synthetic datasets, including the rain direction, density, shapes, etc.In other words, to solve the real rain removal task well, the characteristics of synthetic data should be closer to those of the real rain as much as possible.However, most existing datadriven SID methods usually focus on the performance evaluation of synthetic datasets to reflect the deraining ability of the proposed rain models and are only tested on some real rain images to evaluate the so-called generalization ability.Clearly, this evaluation process is understandable and can be performed easily, however, the key and the most difficult issue is how to define a so-called "best" synthetic dataset.That is, how to select data and rain models appropriately for the task, and then design the corresponding network structures and optimization strategies will be worthy of discussion.But to the best of our knowledge, there is no prior study or discussion on these topics yet.

The Three Factors
In this section, we discuss the divisions of the three factors based on data (general vs. specific), rain model (synthetical vs. mathematical) and network architecture (blackbox vs. white-box) by new and more reasonable criteria.

Data
As far as we can know from the literature, more than 30 paired datasets were proposed in current studies from 2017 to 2022.These datasets contain image pairs (i.e., rain image (O) and a clean ground-truth image (B)), as well as some side information, such as rain streak (S), rain mask (M), image depth (F), rain density (D), transmission map (T), and atmospheric light (A).According to whether the data contain extra side information, we divide the SID data into general data and specific data.Fig. 2 and Fig. 3 illustrate some representative general and specific datasets.Fig. 4 shows an example of paired images with side information for SID.Table 2 summarizes these mainstream paired datasets.

General data
The datasets of this kind only contain the image pairs (i.e., rain image and the corresponding ground-truth image), without any side information.Some representative general datasets are described below: • Rain800 [154]: 800 clean images are randomly chosen from the UCID [93] and BSD [1] datasets.The authors  add the rain streaks into these clear images using Photoshop [81].After that, 700 pairs are chosen for training, while 100 pairs are chosen for testing.
• Rain1400 [23]: 1,000 clean images are randomly chosen from UCID [93], BSD [1] and Google.Based on the Photoshop software [81], each clean image was used to generate 14 rain images with different streak orientations and magnitudes.They randomly select 12,600 image pairs for training and 1,400 for testing.
• Rain12 [66]: It is only a test dataset that contains 12 rain images with one type of rain streak.
• SPA-Data [113]: 170 real rain videos are captured by a mobile phone or collected from the Internet.By using the video deraining method [113], 29,500 image pairs are generated, split into 28,500 for training and 1,000 for testing.
• Auto100L/Auto800 [123]: These two datasets are generated by a GAN-based model based on Rain100L [136] and Rain800 [154].By automatically adding rain streaks in unsupervised mode, the rain streaks have more shapes and directions that will be adapted to natural rain streaks than the original datasets that are manually created by Photoshop.
• RainDrop [84]: Based on using a camera with two pieces of glass, the authors consist of 1,119 pairs of raindrops images with various background scenes, where 861 images are employed for training.
• RainDS [85]: The first real-world deraining dataset, including different types of rain (rain streak and raindrop) captured in various lighting conditions and scenes.It contains 250 real-world and 1,200 synthetic rain image pairs.
• QSMD-Data [116]: This dataset follows [63] to prepare the training set that contains 20,800 pairs.The synthetic rain images are synthesized with rendered rain layers and the ground truth by using the screen blend mode.
• Rain-II [117]: 400 images are synthesized resembling [81] and the synthetic rain images possess apparent vapor, called Rain-II.This dataset is only used for testing.
• Rain20 [121]: Resembling Wei et al. [122], 20 rain images with complexity and multiformity of the rain streaks are synthesized.These testing images contain two different scenarios, i.e., sparse rain streaks and dense rain streaks, and each scenario contains 10 test images.
• GT-RAIN [2]: This dataset includes 26,124 training frames, 3,300 validation frames and 2,100 testing frames.These frames cover a large variety of background scenes from the urban locations to natural scenery, spanning a wide range of geographic locations, including varying degrees of   illumination from different times of day and rain of varying densities, streak lengths, shapes and sizes.

Specific data
Besides the corresponding image pairs, specific datasets also contain side information, e.g., rain streak, rain mask, image depth, rain density, transmission map, and atmospheric light.Some representative specific datasets include: • Rain100H [136]: 1,900 clear images are selected from BSD [1].The rain streaks are synthesized by photorealistic rendering techniques [28] or by adding simulated sharp line streaks along a certain direction.Rain100H has 1,800 image pairs for training with three to five layers of rain streaks and 100 image pairs for testing.
• Rain100L [136]: 300 clear images are selected from BSD [1].Rain100L has 200 image pairs with only one type of rain streak for training and 100 image pairs for testing.
• Rain200H/Rain200L [136]: Rain200H/Rain200L are two new versions based on the original Rain100H/Rain100L datasets.The clean backgrounds for these two datasets are • Rain1200 [153]: By using the Photoshop software [81], each image from 4,400 clear images is selected to synthesize three rain density levels (i.e., light, medium, and heavy) rain images.After that, 12,000 image pairs are used to train, while 1,200 image pairs are used for testing.
• RainCityscapes [42]: 295 clear images are selected from Cityscapes [13] to synthesize rain images by using the camera parameters and scene depth information.The training set has 262 images, while the test set has 33 images.
• NYU-Rain [60]: For this dataset, a total of 16,200 clear images are chosen from the NYU-Depthv2 [96] to render the synthetic rain streaks and rain accumulation effects based on the provided depth information.13,500 image pairs are then used for training, with 2,700 for testing.
• Outdoor-Rain [60]: Clear images are chosen from Qian et al. [84] to synthesize the rain images by the same method as NYU-Rain.The outdoor-Rain dataset contains 9,000 training samples and 1,500 testing samples.
• DDC-Data [63]: The clean images are chosen from the BSD [1], and the rain streak layers are generated following Photoshop software [81] with varying intensities, orientations, and overlaps.Finally, a dataset containing 10,400 image pairs is fused in the way of screen blending.
• RainDirection [73]: The rain images in RainDirection are obtained by adding clean images from Flick2K and DIV2K datasets [99] with synthetic labeled rain maps according to the rain model Eqn.(1).Each rain image is assigned a direction label.The training and test set of RainDirection contains 2,920 and 430 images, respectively.
• RainLevel5 [78]: The clean images are selected from the Cityscapes dataset [13] and 5 levels (25, 50, 75, 100, and 200 mm/hr) of rain images with corresponding fog rendering is synthesized.The dataset includes 26,870 pairs for training and 2,880 pairs for testing.
• Cityscapes_syn [127]: Cityscapes [13] are used as clean images and a physics-based rendering way [36] is used to define this dataset with two different rain speeds, i.e., 100 mm (light rain) and 200 mm (heavy rain).Each sub-dataset has the same image numbers as Cityscapes (i.e., 2,975, 500, and 1,525 for training, validation, and testing, respectively).
• Cityscapes_real [127]: Cityscapes [13] are used as clean images and real rain streaks from SPA-Data [113] are used to synthesize real rain streak dataset.The dataset includes 2,975 pairs for training and 500 pairs for testing.
• RainKITTI2012 [155]: Photoshop is used to create a synthetic RainKITTI2012 dataset based on the public KITTI stereo 2012 dataset [30].The training set contains 4,062 image pairs from various scenarios, and the testing set contains 4,085 image pairs.
• RainKITTI2015 [155]: Similar to the RainKITTI2012, KITTI2015 dataset is another set from KITTI stereo 2015 dataset [30].The authors also synthesize a RainKITTI2015 dataset, whose training and testing set contain 4,200 and 4,189 pairs of images, respectively.
• OxfordRaindrop [83]: The authors present the hardware used to record their narrow-baseline stereo dataset that allows one lens to be affected by real water droplets while keeping the other lens clear.They have collected approximately 50,000 pairs of images by driving in and around the city of Oxford.They have selected 4,818 image pairs to form a training, validation, and testing dataset.From the testing partition, they have created ground truth road marking segmentations for 500 images.
• RH [35]: A dataset called RH of 1,619 images is composed of two parts, including the dataset captured by Qian et al. [84] and 500 clean/corrupted pairs of images captured by the authors.They use Nikon D5300 to capture various background scenes which include the raindrops and the haze.The thickness of the glass slabs is 3 mm.To minimize the reflective effect of the glass, the distance between the glass slabs and the camera lens has been set between 2 to 8 cm to generate diverse raindrop images.
• RaindropCityscapes [37]: The first photo-realistic adherent raindrop dataset with pixel-level mask in autonomous driving settings based on Cityscapes dataset [13].For each background image, the authors generate 50 to 70 raindrops.Finally, they make a dataset containing about 30,000 images based on the training set of Cityscapes for training and 1,525 images based on the test set of Cityscapes for testing.

Rain Model
A rain model mainly indicates two issues: 1) how to model the rain mathematically, and 2) how to synthesize the rain image.However, only some rain models can be used to synthesize rain images, while others are only mathematical models of rain.According to whether it can generate data, we divide current rain models into synthetical rain model and mathematical rain model.generating a rain image by a synthetic rain model.

Synthetical rain model
Both the synthetical and mathematical rain models can model the rain, while only the synthetical rain model can be used to synthesize rain images.In what follows, we introduce some representative synthetical rain models in detail: • Rain Streak Model (RSM).This is the most fundamental rain model, and the formulation is described as where denotes a rain image that is decomposed into a rain streak part and a clean background .RSM is widely used in data-driven SID methods [49,103,113,149,153,154].Datasets like Rain800, Rain1400, and Rain1200 are synthesized based on this model.Note that, from Eqn.(1) to Eqn.( 16), we use uniform mathematical characters.
• Screen Blend Model (SBM).Different from the RSM, SBM is a non-linear composite model formulated as where ⊙ is element-wise multiplication.Unlike the additive composite character in Eqn.(1), the background and rain streak influence the appearance of each other.Li et al. [63] uses SBM as the rain model and synthesizes the DDC-Data.
• Image Depth Model (IDM).According to [29], the visual intensity of a rain streak depends on the scene depth from the camera to the underlying scene objects behind the rain.The mathematical model is formulated as follows: where is the atmospheric light, which is assumed to be a global constant [92], denotes the fog layer with a range of [0,1], is a matrix of ones.Hu et al. [42] use the IDM as the rain model and synthesize the RainCityscapes dataset.
• Heavy Rain Model (HRM).Heavy rain often causes rain accumulation and visual effects with haze.Rain accumulation or veiling affects the result of water particles in the atmosphere and distant rain streaks that cannot be seen individually.The mathematical model can be expressed as where each ̃ is a layer of rain streaks with the same direction, is the index of the rain-streak layers, and is the maximum number of rain streak layers.denotes the transmission map introduced by the scattering process of the tiny water particles and is the global atmospheric light of the scene.Li et al. [60] uses HRM as the rain model and synthesizes the NYU-Rain and Outdoor-Rain datasets.
• Rain Accumulation Model (RAM).RAM considers raining the accumulation effect like Heavy Rain Model except with adding rain mask information.The mathematical model can be expressed as follows: where is a rain mask image which is a binary image, i.e., 1 for rain, and 0 denotes rain-free in pixel.Yang et al. [136] uses RAM as a rain model and synthesizes the Rain100H and Rain100L datasets.

Mathematical rain model
Unlike the synthetical rain models, mathematical rain models are designed to model and solve the physical phenomenon of rain from a mathematical point of view.However, the authors do not (or could not) accurately synthesize rain images according to these kinds of models, so we define them as mathematical rain models.
• Rain Residual Model (RRM).This model is similar to RSM.The advantage of using the rain residuals is to obtain a cleaner background than using the rain streaks for training, but this formula cannot be used to synthesize rain images: where indicates the residual of rain steak, which is a negative matrix.RRM is wildly used in [23,25,88,152].
• Base Detail Model (BDM).Instead of decomposing the rain image as a rain streak layer and a background layer, BDM decomposes the rain image into the sum of a base layer and a detail layer by using a low-pass filter: where the detail layer contains the structure information, and the base layer is similar to a clear image.Note that BDM is used in the SID method [22].
• Raindrop Mask Model (RMM).This model formulates the raindrop-degraded image as a combination of a background image and the effect of raindrops: where is the binary mask, and ( ) = 1 means the pixel is part of a rain region, and otherwise means it is part of background regions. is the effect brought by the raindrops, representing the complex mixture of the background information and the light reflected by the environment and passing through the raindrops adhered to a lens or windscreen.SID method [84] uses RMM as the rain model.
• Raindrop Transparency Model (RTM).RTM considers a simple linear model with transparency: where ∈ [0, 1] × × denotes the transparency matrix.Each entry of represents the percentage of the light path covered by raindrops for the corresponding pixel.The study in [86] has adopted RTM as the rain model.
• Rain Convolutional Dictionary (RCD).RCD considers the problem under the conventional prior-based methodology by exploiting the prior knowledge for representing the rain streaks, which is formulated as follows: where ), denotes the ℎ color channel of rain streak , and , ⊂ ℝ × is a set of rain kernels which describes the repetitive local patterns of rain streaks, and ⊂ ℝ × represents the corresponding rain maps representing the locations where local patterns repeatedly appear.
is the number of kernels and ⊗ is the 2-dimensional (2D) convolutional operation.RCD-Net [110] uses the RCD as the rain model.
• Two Transmissions Model (TTM).Because the rain streaks and vapors are entangled with each other, this mathematical model entangles the rain streaks and vapors properly from the transmission medium perspective: where and are the transmission maps of rain streaks and vapors, respectively.TTM is adopted in the SID method [117] as the rain model.
• Streak Drop Model (SDM).In real-world rain weather scenarios, rain streaks and raindrops may co-occur in outdoor image capture.As such, this rain model considers both the rain streak and raindrop in one formula: where is the global atmospheric lighting coefficient.SDM is adopted in the SID method [85] as the rain model.
• Haze-Like Model (HLM).The authors handle the hazelike rain effect by modeling its influence directly using a new variable in their general model for rain images: where denotes the degradation caused by haze-like effect and ⊙ denotes the pixel-wise multiplication, respectively.HLM is adopted in the SID method [114] as the rain model.
• Rain Disentangled Model (RDM).This model solves the deraining problem by distinguishing the special characteristics of and to alleviate the over-smoothness and artifacts in the final derained results: where and denote two sets of learnable parameters for and respectively.represents the weighting factors of atmospheric reflected light and transparency in different luma and chroma channels under rain conditions, and represents a kind of mixed characteristic of the rain streaks (i.e., the region and amplitude of rain streaks).RDM is adopted in the SID method [51] as the rain model.
• Uncertainty Raindrop Model (URM).This model is based on the Raindrop Mask Model (Eqn.(8)) and considers the blur level of raindrops: where ∈ [−1, 1] is a blurry confidence map with a value between −1 and 1, which the authors call the uncertainty map.The larger the value of ( ), the more blurred it is at the pixel , which means that should be regarded as a raindrop and the raindrop effect ⊙ will increase accordingly.
The plus or minus sign of determines whether a raindrop reduces or increases image brightness.URM is adopted in the SID method [95] as the rain model.
• Motion Blur Model (MBM).Because of the high velocity of the raindrops, Wang et al. model the generation of rain streaks as the following motion blur process: where and are the angle and length of the motion blur kernel ( , ), respectively.∈ ℝ × ×3 is the raindrops mask, and ⊗ denotes the spatial convolution operator.The two important factors, i.e., the length and the angle , essentially encode the motion blur kernel and can be easily inferred from the rainy images by exploiting the repeatability of the rain streaks.MBM is adopted in the SID method [118] as the rain model.
• Rain&Haze Model (RHM).This model is similar to the Heavy Rain Model (Eqn.(4)), and considers joint raindrop and haze removal: where indicates the global atmospheric light, and denotes the transmission map. is an unit matrix (all-ones matrix), ( − ) indicates the locations of individually visible raindrops.Here, elements in are binary values, where 0 indicates raindrop regions and 1 indicates non-raindrop regions.RHM is used in the method [35] as rain model.

Network Architecture
The network architecture is always the most critical and concerning factor for SID.By designing new modules, researchers can construct a pipeline to solve the proposed rain model.According to whether the network derives all the rain model information, we divide the current SID network architectures into white-box network architecture and blackbox network architecture.Fig. 6 shows examples of two network architectures to learn information in different ways.

White-box network architecture
The term "white box" here comes from the well-known "white-box testing", which is one of the main software testing methods.The main feature of white-box testing is that the examiner understands the internal structure of the program.Similarly, the white-box network architecture means that one can use the network to derive all rain model information.This kind of network structure often needs ground truth or prior knowledge.For example, Li et al. [60] predict the derained image by the following formula: where ̃ , ̃ , and ̃ are obtained by the neural network in supervised mode.A network of this kind can derive all the rain model information that occurred in Eqn.( 4) to obtain the final derained result is called white-box network architecture.Many existing networks in current SID methods are white-box architectures, such as [60,103,153,154].

Black-box network architecture
Compared to the white-box network architecture, blackbox network architecture does not learn all rain model information but uses the robust fitting and black-box capability of the deep neural network for forcible training instead.This kind of network is called black-box network architecture.The term "black box" means that the internal structure of the program is unexplainable to the examiner.For example, the framework in [136] aims at predicting the derained result in a black-box way, which is formulated as where represents the function of neural networks, denotes the rain feature maps extracted by previous CNN, ̃ denotes the rain mask predicted from the convolutional process of , and ̃ denotes the rain streak predicted from the convolutional process based on the concatenation [ , ̃ ].In Eqn.(19), only ̃ , ̃ , and ̃ are derived in the supervised training process, while the transmission map and atmospheric light in Eqn.(5) were not derived.Note that the existing SID methods [42,64,84,88,136] have adopted the black-box network architecture.

In-depth Analysis of Data-driven SID
We analyze the intrinsic relationships of the three influencing factors and then discuss the solving paradigms of existing SID methods based on the preceding studies.

Relationship of The Three Factors
To facilitate the analysis, we denote the set of data, rain model, and network architecture as ℍ, , and , respectively.The image pairs and side information are respectively represented as [ , , ] ∈ ℍ, where and represent the rain image and clear ground-truth image, ( = 1, ..., ) denotes side information.Then, we can mathematically formulate the process of rainmaking and deraining as follows: where (⋅) ∈ is a rain model for rainmaking or rain generation, by which the rain image is generated, and (⋅) ∈ is a network architecture for deraining, by which the derained image ̂ and side information ̂ are predicted.are the learnable parameters of network.From Eqn. (20), we can see that the rain model (⋅) is to make or generate rain, while the network architecture (⋅) is an inverse function to remove rain.The three factors and loss function can be taken into account at the same time and the deep learningbased rain removal mechanism can be described as where s.t.denotes the condition that the previous equation needs to satisfy. (⋅) ∈ denotes loss function. W MSPFN [49] Gen. Syn.W JDNet [103] Gen. Syn.W GraNet [149] Gen. Syn.W Fu et al. [23] Gen. Math.W MPRNet [152] Gen. Math.
W DTDN [120] Gen. Math.W UMAN [95] Spe.Math.W KGCNN [118] Spe.Math.W AI-GAN [51] Spe.Math.W AMPE-Net [114] Spe.Math.W Liu et al. [73] Spe.Syn.W Hao et al. [37] Spe.Math.W , ̂ denotes the value of to make  , ̂ get the minimum value.Note that the relationship of the three factors is illustrated in Fig. 7, from which we see that data is the core factor, while the rain model and network architecture are closely associated with data.Specifically, the rain model defined by some prior assumptions can be used to generate data.Then, the network architecture uses the data to solve the rain model by maximum posterior probability reversely.That is, rainmaking and deraining are inverse processes in the SID task.Since data play a core role in the three factors, we mainly study the effectiveness of SID data in Section 6.

Solving Paradigms of SID
Due to the complexity of the optical characteristics of rain [29], the uncertainty of prior knowledge and the flexibility of deep learning solution, the relationship of the three factors does not always follow Eqn.(21).The relationships between the three factors are often asymmetrical or inconsistent.The complexity of the relationships leads to different solving paradigms of SID methods.According to whether the solving paradigm follows Eqn.(21), we divide

Table 4
Examples of current data driven SID methods that adopt the implicit solving paradigm, where Gen. and Spe. in the data column denote general and specific data, respectively; Syn. and Math. in the model column denote the synthetical and mathematical rain model, respectively; W and B in the network column denote the white-box and black-box network architecture, respectively.

Explicit solving paradigm
We classify the solving paradigm as "explicit" only based on one judgment, i.e., the relationship of the three factors in the method is consistent with Eqn.(21).As described in Table 3, we see that existing SID methods with whitebox network architecture are ascribed to the explicit solving paradigm.For example, DID-MDN [153] uses the RSM to synthesize the Rain1200 dataset, then uses a network architecture to solve it in white-box mode.Similarly, the method [60] builds the NYU-Rain and Outdoor-Rain data, while using Eqn.(4) to solve RAM.

Implicit solving paradigm
We classify the solving paradigm of a method as "implicit" based on two judgments: 1) the network architecture is not consistent with the rain model.For example, the methods [136] use RAM (Eqn.(5)) and neglect to predict the Synthetical Mathematical General Specific Fu et al. [23] DerainCycleGAN [123] White-box Black-box
transmission maps and atmospheric light in their network architecture, while [42,84,88,113,154] cannot to predict rain streak information; 2) the training data are not consistent with the rain models.For example, the method [25] adopt the RRM (Eqn.(6)) as its rain model, which only contains the clean image and rain residual layer, but also uses Rain100H and Rain100L as the training data.As shown in Table 4, existing SID methods with black-box network architecture are ascribed to the implicit solving paradigm..

Division of Recent SID Methods
Based on the above analysis, we can categorize some current data-driven SID methods from the six aspects of the three factors (i.e., general vs. specific, synthetical vs. mathematical, black-box vs. white-box).Fig. 8 summarizes the division of partial data-driven methods based on the three factors.From the classification, we can conclude that: 1) Current methods usualy use general data (i.e., Rain800, Rain1400, SPA-Data) and synthetic rain model (i.e., RSM), which is the most basic and simplest way to solve SID task.The main difference between those methods is the design of domain knowledge, which will be discussed in Section 5; 2) Among the reviewed 65 methods in Fig. 8, there are 37 black-box and 28 white-box SID methods.More researchers tend to solve the rain model inaccurately, thanks to the amazing fitting ability of neural networks; 3) More and more methods use specific data (i.e., RainCityscapes, NYU-Rain, DDC-Data) and mathematical rain model (i.e., RCD, TTM, SDM) to solve SID problem [42,60,85,110,117,135], which results in better performance;

Further Analysis on SID Methods
In addition to the analysis of the three factors, SID methods still have some other aspects that can be further ex- plored, such as training strategy, network pipeline, domain knowledge, data preprocessing, and objective function.However, these issues have not been fully investigated in previous studies.This section will summarize existing representative SID methods to describe these features.The main information is summarized in Table 5 and 6, and the statistical results are illustrated in Fig. 10.

Training Strategy
To improve the generalization ability of the SID methods to real-world scenarios, some methods also try to use real rain images as training data.According to whether unlabeled or unpaired data are involved during training, the training strategies can be divided into three learning modes, i.e., supervised, unsupervised and semi-supervised.
• Supervised.In supervised mode, the network is optimized by calculating the loss between the predicted image and the labeled ground truth image.For example, JORDER [136] can predict the rain mask, rain streak, and background information based on a fully-supervised training strategy.
• Unsupervised.There are no ground truth images in unsupervised mode, and the deraining network is usually optimized by using prior knowledge or self-supervision.For example, RR-GAN [158] uses consistency regularization to learn the rain distribution without using label information.
• Semi-supervised.By using both paired and unpaired images, the semi-supervised model can be optimized in the case of transfer learning, domain adaption, and sharing of network parameters.For example, SIRR [121] shares a CNN to learn both the synthetic and real-world rain distribution by regularizing the KL distance.
Remarks.We count the training strategies of the methods in Table 5 and 6, and listed the statistical results in Table 7, from which we can conclude that: 1) There are only 8 semi-supervised and 3 unsupervised SID methods, and 86 supervised SID methods.The main reason is that the SID problem is ill-posed and is difficult to solve via weak constraints in semi/un-supervised mode; 2) Most current methods use the supervised mode.A supervised method can significantly perform on test data with the same distribution by involving paired data.However, these pre-trained methods often failed in the case of realworld rain images due to the poor generalization ability; 3) Among those 11 semi-/un-supervised SID methods, 4 semi-supervised and 3 unsupervised methods are based on GAN [33,160], since GAN can provide better image generation and restoration in weakly supervised mode.Besides, 8 semi-/un-supervised SID methods use the parallel pipeline to construct steady network architecture.Note that the parallel pipeline will be introduced below.

Network Pipeline
Due to different training strategies and domain knowledge, data-driven SID methods have different network pipelines.According to the flow of data in the network, we can divide the network pipeline into three structures, i.e., sequential, parallel and hybrid, as illustrated in Fig. 9.
• Sequential.This is the most basic and common network pipeline, where data flows linearly.For example, Fu et al. [23] adopts a linear way to build an end-to-end network.An architecture with multi-streams handling the same task can also be considered a sequential mode.For example, MH-DerainNet [126] uses different kernel sizes to get features with different receptive fields.
• Parallel.Instead of simply using a single neural network, the parallel pipeline can handle equally important tasks.For example, DerainCycleGAN [123] uses parallel networks to process both rain and clean ground-truth images.An architecture with multi streams handling the different tasks can also be considered as parallel mode.For example, De-rainNet [22] uses two streams to dispose of the detail and base layers, respectively.
• Hybrid.A hybrid pipeline would have both sequential and parallel architectures inside.Since the modules can be assembled in a more complex way, in case that more side information needs to be processed simultaneously.For example, DAF-Net [42] uses a hybrid network to handle the tasks of learning depth map and image features at the same time; JORDER [136] firstly employs the sequential multi-streams to extract feature maps with different receptive fields, and then uses three parallel convolutions to learn the rain mask, rain streak, and background, respectively.
Remarks.We count the network pipelines of the methods in Table 5 and 6, and listed the statistical results in Table 8, from which we can conclude that: 1) There are 32 sequential, 21 parallel and 44 hybrid pipelines used in current methods.The hybrid structure is more popular than the sequential and parallel structures; 2) More SID methods choose the sequential pipeline from 2017 to 2018 while choosing the parallel and hybrid pipelines to consider more side information from 2019 and later.Note that SID methods needing side information (i.e., rain density, image depth) prefer parallel and hybrid pipelines.That is because additional networks need to be designed for learning the side information; 3) From 2020 to 2022, 30 of 63 existing methods used the hybrid pipelines, compared to 14 with the hybrid pipeline in 34 SID methods proposed during 2017 and 2019.More and more hybrid pipelines are used, which means researchers begin to design more complicated architectures; 4) The pipeline of GAN-based deraining methods, such as RR-GAN [158] and Qian et al. [84], can be regarded as parallel, since the generator and discriminator are generally two independent networks.

Domain Knowledge
Due to the complexity of the SID problem, people explored different domain knowledge to extract feature information and obtained better deraining performance.Generally, there are two main types of domain knowledge.One is called prior knowledge based on the digital signal/image processing and probability statistics, e.g., guided filter [38], mutual information [56] and Kullback-Leibler divergence [40].The other one is deep-learning knowledge based on deep feature extraction, e.g., GAN [33], LSTM [18], highlevel perceptual feature [54] and U-net [90].Besides, some other domain knowledge, e.g., parameter initialization and learning rate strategy, are used as knowledge.
• Prior Knowledge (PK).This knowledge is based on digital signal/image processing and probability statistics.For the SID task, researchers used a guild filter to decompose the original rain image into two parts: a high-frequency component (or detail layer) and a low-frequency component (or base layer).The former contains more rain information, while the latter contains more background information.For example, Fu et al. [23] decomposes the rain image into a base layer and a detail layer, while Yang et al. [134] decompose the rain image into high-frequency and low-frequency components.These decomposition operations are beneficial to the process of deraining, while preserving accurate background.Another example is the log histograms of filtered natural images below the straight line connecting the minimal and maximal values and naming this property as sparsity prior.Therefore, sparsity has great potential to separate rain and non-rain textures during deraining.For example, Wang et al. [116] develop a quasi-sparse distribution to approximate the sparsity to obtain a feasible loss function.
• Deep-learning Knowledge (DK).It uses deep neural networks to extract feature information.For example, many methods use perceptual features to preserve the image contents.VGG-16/VGG-19 [97] pre-trained on ImageNet [15] can obtain higher-level features.The perceptual loss [54] measuring the difference of high-level features has delivered better visual performance than the per-pixel loss used in traditional methods [23,136].In addition, by utilizing unique modules, SID methods can obtain more representational features.Some of these modules are universal modules, such as dilated convolution [148] used in methods [25,136], LSTM [18] used in methods [84,88], U-net [90] used in methods [63,84], and ShuffleNet [156] used in QSMD [116].Others are based on attention mechanisms and specifically designed, such as channel attention used in methods [17,64], spatial attention used in SPANet [113], and confidence map network used in UMRL [143].Recently, transformer [101] designs a self-attention mechanism to capture global interactions between contexts and has shown promising performance in SID methods [119,129,151].
• Other Knowledge (OK).In addition to the above two kinds of domain knowledge, there are still many operations based on the understanding of SID.For example, parameter initialization makes the initial network converge quickly and can avoid falling into the local saddle point or gradient cliff initially.A common approach is to use the Gaussian distributions to initialize the network parameters, which are superior to the random initialization.Another approach is to use the Xavier [31] method to initialize the network parameter, such as ResGuideNet [21].Some networks, such as the VGG network pre-trained on ImageNet, can initialize the parameters of networks with the same structures, such as DAF-Net [42].In addition, some other SID methods with complicated network architectures often train in different phases.For example, Li et al. [60] trains the network in the physics-based stage and model-free stage, re- spectively; Wang et al. [117] firstly pre-trains the ANet and SNet, and jointly trains the whole network at last.Learning rate is also considered to obtain better deraining performance.For example, Quan et al. [85] set the learning rate to 0.001 and adopt a cosine scheduler for training.CNN-based restoration models are usually trained on fixedsize image patches.However, training a Transformer model on small cropped patches may not encode the global image statistics, thereby providing suboptimal performance on full-resolution images at test time.For example, Zamir et al. [151] perform progressive learning where the network is trained on smaller image patches in the early epochs and on gradually larger patches in the later training epochs.
Remarks.We count the domain knowledge of the methods in Table 5 and 6.Besides, we also summarize the used representative deep-learning knowledge in Table 10: 1) Deep-learning knowledge is widely used in most current data-driven SID methods due to the strong representation ability.In addition, prior knowledge is only used in 39 methods, which means we can still explore more effective prior knowledge to help data-driven SID task; 2) Other kinds of knowledge are also important for the data-driven SID task since almost every method uses these tricks to obtain more stable and better performance; 3) Among the deep-learning knowledge, GAN [33], attention [101], encoder-decoder [3], Perceptual [54], Dilated Convolution [148] and multi-stage are used more frequently; 4) LSTM [18], Multi-scale, Two-branch, Multi-stream, Multi-task, Squeeze-and-excitation Network [41] and Shuf-fleNet [156] are often adopted in data-driven SID methods; 5) Neural Architecture Search [162] and Graph Convolutional Network [55] are used in CCN [85] and Fu et al. [25] in 2021, respectively.We suggest the researchers pay more attention to these new technologies, which have not  been applied frequently in SID tasks.

Data Preprocessing
Data preprocessing is also worthy of investigation for SID.Most methods read image data in PNG or JPG format, and then convert it to a tensor form.We divide the data preprocessing methods into fixed and random sampling according to the pattern of image clipping and augmentation.
• Fixed Sampling (large resolution).This sampling method aims at cropping a fixed-size image from the middle or random position of the input.For example, Zhang et al. [153] randomly crop a 512 × 512 image from the input (or its horizontal flip) of size 586 × 586.Due to the large proportion of cropping, data augmentation is rarely used.Fixed sampling can obtain an image with a larger receptive field and is suitable for the method with downsampling.
• Random Sampling (small resolution).Different from the fixed sampling, images of small size (namely, patches) are randomly or regularly cropped from the original image.For example, Fu et al. [23] randomly selected 9,100 images, generating three million 64 × 64 rain/clean patch pairs.Random sampling has two advantages: 1) it can augment data and therefore reduce the underfitting phenomenon caused by insufficient data; 2) it can greatly reduce GPU consumption and speed up the training process.A larger batch size can be set in training to reduce the over-fitting during the optimization process.
Remarks.We count the data preprocessing of the methods in Table 5 and 6, and listed the statistical results in Table 11, from which we can conclude that: 1) Most SID methods (a total of 56) chose random sampling, while only 27 methods used fixed sampling, from which we can infer that both data augmentation and large batch size are effective for the SID task; 2) Among those 11 SID methods [42,60,63,78,80,84,134,135,136,153,159] using side information, 5 have used the fixed sampling, and 5 have used the random sampling.We speculate that the correlation between data preprocessing and side information is not strong.

Objective Function
The objective function is also an important part of datadriven SID methods.The parameters of neural networks can be learned via the objective function in the training process.According to whether the constraint contains ground truth, current objective functions can be generally divided into Fidelity-Driven Metrics (with ground truth) and Proba-bility&Statistics Models (without ground truth).
where and ̂ are ground-truth and derained images respectively in Eqn.(22) (23) ( 24) and (25).In Eqn.(24), is a tiny constant, and this charbonnier loss is used in SID methods [48,119].In Eqn.(25), and ̂ denote the averages of and ̂ , respectively.( ) and ( ̂ ) denote the variance of and ̂ , respectively. 1 and 2 are two numbers used to stabilize the division with a weak denominator.
• Probability&Statistics Model.Many objective functions are based on informatics, probability theory, and mathematical statistics, e.g., Adversarial loss (Adv), Cross Entropy loss (CE), Kullback-Leibler divergence (K-L), Gaussian Mixture Model (GMM), Gaussian Process (GP), Quasi Sparsity loss (QS), Autocorrelation loss (AC), Total Variation (TV), Triplet loss, Margin loss, Contrastive loss, Knowledge Distillation loss, and Dark Channel loss (DC).These objective functions are defined from Eqn. (26) to Eqn.( 38):  26) is the adversarial loss used in GAN-based SID methods, such as [84,123,158], where is noise, and are discriminator and generator, respectively.In Eqn.(27), the first and second expressions are the dichotomous and multiple classification expressions, respectively.In Eqn.(28), ( ) and ( ) are two probability distributions on random variables, and Ni et al. [78] use K-L divergence to calculate the distance between the ground-truth and predicted image.In Eqn.(29),  is rain streak, is the number of mixture components, , and Σ are mixture coefficients, Gaussian distribution means, and variance, respectively.Wei et al. [121] use the negative log-likelihood function to constrain the unsupervised samples, where Π = 1 , ... , Σ = Σ 1 , ...Σ , is the number of mixture components, and is the number of samples.In Eqn.(30), , ′ ∈  denote the possible inputs that index the GP.In Eqn.(31), where is rain image, , is the ℎ filter centered at the ℎ pixel.
* is the convolution operation.Wang et al. [116] use it to construct a QS loss, where  (⋅) denotes the network inference.In Eqn.(32), max represents the index of the top -th coefficient, and mean ac , denote the mean values of ac and ac .In Eqn.(33), the total variation function  ( ) encourages the images to include piece-wise constant patches, images are discrete ( ∈ ℝ × ) and = 1.Huang et al. [46] adopt a Total Variation regularizer term to smooth the recovered background image.In Eqn.(34), the input is a triplet, including the anchoring example ( ), positive example ( ), and negative example ( ).Deng et al. [27] use a triplet loss by optimizing the distance between the anchoring example and the positive example to be smaller than the distance between the anchoring example and the negative example, the similarity calculation between the samples is realized.In Eqn.(35), where is the target label ([0,1]), is the vector of each capsule in the last layer; + and − represent the margins for the positive and negative predictions.Yang et al. [133] use this margin loss in their work.In Eqn.(36), is a degradation representation, + and − are the corresponding positive and negative counterparts, respectively.is a temperature hyper-parameter and denotes the number of negative samples.Li et al. [57] use this contrastive loss in their work.In Eqn.(37),  ( ) and  ( ) denote the output of the teacher model  and student model , respectively. (⋅) is a similarity criterion loss (e.g., L1 loss).Zou et al. [164] use this knowledge distillation loss to force  to approximate the original statistical modeling of  .In Eqn.(38),  and  is max value of pixel in synthetic streak feature  and real-world streak feature  , respectively.Cui et al. [14] use this dark channel loss on the deraining images of the student model.
Remarks.We count the objective functions of the methods in Table 5 and 6, and listed the statistical results in Table 12, from which we can conclude that: 1) Fidelity-driven metrics are wildly used in data-driven SID methods.Specifically, MSE is the most popular loss function, as 58 of 97 methods chose it, while MAE is the second-popular since 49 SID methods selected it; 2) A single loss is used in 36 SID methods as the objective function.Specifically, MSE, MAE, and SSIM are used as a single loss function in 14, 13, and 6 current SID methods, respectively.This phenomenon implies that a single loss function can also make network convergence well; 3) The cross-entropy has been chosen by 7 SID methods.For example, it is used as a loss function to constrain the rain mask prediction ( [136]) or rain density prediction ( [153]); 4) Adversarial loss is widely selected by 23 GAN-based SID networks, such as ( [53,84,120,123,124,158]); 5) SSIM is used in 27 SID methods, such as [21,24,72,88,113,141], and it was frequently used 9 times in 2022; 6) The probability&statistics models are used in 42 methods.Due to the complexity of rain distribution, probabil-ity&statistics models are usually hard to be guaranteed, which is usually used as an auxiliary constraint.This form of unconditional constraint may be a future research direction.

Data Ranking Experiment
In previous related works [61,107,109,137], it is rather common to perform experiments on synthetic datasets and compare the results of different methods.Generally speaking, the researchers tend to perform a horizontal comparison, i.e., evaluating different SID methods on the same dataset, but rarely make a vertical comparison and analysis of the performance of each method on different synthetic datasets, mainly because the horizontal comparison can highlight the superiority and fairness of each method.Note that the authors of related papers on SID usually select three or so synthetic datasets for experimental evaluation.The main reasons are twofold.First, due to many synthetic datasets, it is time-consuming and laborious to evaluate the methods on all of them.Second, researchers may prefer to choose those favorable datasets for the experiments, so that the proposed method can perform better than other competitors.In this paper, we mainly vertically evaluate the contribution of the data generated by rain models on the task.Although the vertical study cannot highlight the characteristics of the method itself, it can reflect the difficulty of the dataset and the matching degree with the rain model.Since data play a core role among the three factors, we mainly focus on analyzing data and designing a group of experiments to study the impact of data and evaluate the effectiveness of SID data.

Experiment Design
The data ranking experiments are mainly prepared to figure out which paired dataset is most effective for the SID task, denoted by EOD (Effectiveness of Data).A higher EOD score can at least reflect two things: 1) this dataset is more challenging and can potentially stimulate the network to fit continuously, without causing rapid convergence; 2) a pre-trained method on this dataset will potentially have better generalization ability in the real scenario.To obtain more fair and reliable evaluation results, we will conduct four tests from different dimensions: • Test-1: Objective evaluation on paired datasets by fullreference Image Quality Assessment, i.e., PSNR and SSIM; • Test-2: Objective evaluation on real image datasets by using no-reference Image Quality Assessment metrics, i.e., SSEQ, NIQE, BRISQUE, and BLIINDS-II; • Test-3: Subjective evaluation on real image dataset by the human subjective assessment, i.e., Bradley-Terry model; • Test-4: High-level task evaluation on real image dataset by using object detection evaluation metrics, i.e., mean Av-  From (a) to (i) denote the results of methods pre-trained on the 9 public paired datasets, i.e., Rain100H [136], Rain100L [136], Rain800 [154], Rain1400 [23], Rain1200 [153], SPA-Data [113], RainCityscapes [42], Outdoor-Rain [60], and MPID [61], respectively.
For better understanding, we list the implementation details and emphasis on the four types of tests in Table 13, from which we see that different tests have different emphases, and a fair final score can be obtained by combining different subjective and objective evaluations and low-level and high-level tasks.Specifically, Test-1 is done for basic simulation experiments, i.e., training and testing on synthetic datasets, and is mainly used to verify the difficulty of the data itself.Test-2 and Test-3 evaluate the generalization abilities of data by using both objective and subjective evaluation metrics, which can indicate how large the difference between synthetic and natural scenarios.Finally, Test-4 is conducted on a high-level task, i.e., object detection, which can reflect the impact of data on practical vision tasks.
Evaluation Metrics.To evaluate the deraining result for the images with ground truth in Test-1, we use PSNR [47] and SSIM [7] as evaluation metrics.To evaluate the authentic images without ground truth in Test-2, we use the no-reference IQA metrics, including SSEQ [71], NIQE [77], BRISQUE [76] and BLIINDS-II [91].To evaluate the authentic images by subjective assessment in Test-3, we follow a standard-setting of the Bradley-Terry model [6] to estimate the subjective score of each method so that they can be ranked, with the same routine as [58].For the high-level task evaluation in Test-4, we use the mean Average Precision (mAP) score to evaluate the detection performance.

Experiment Results
We will analyze the deraining results of each test and finally calculate the EOD score in this subsection.

Test-1: Objective evaluation on paired datasets
We evaluate the deraining results on 9 paired datasets, and then calculate the improvement compared with the original performance.For PSNR, the more significant the im- provement, the less challenging the dataset will be.For SSIM, the closer the value is to 1, the less challenging the dataset will be.Taking DetailNet [23] as an example, the original PSNR/SSIM of Rain100H [136] and Rain100L [136] testing sets are 13.56/0.379and 26.90/0.838; the derained PSNR/SSIM are 26.78/0.810and 34.54/0.956.If we consider PSNR, the improvement ratios of Rain100H/Rain100L are 97%/28%.We can infer Rain100L is more challenging than Rain100H.If we consider SSIM, the SSIM value after deraining are 0.810 and 0.956, i.e., Rain100H still has some space to improve since the maximum value of a clean image is 1.In such a case, we can infer that Rain100H is more challenging than Rain100L.As such, we consider both PSNR and SSIM with the same weight.We tested three methods to reduce the error caused by one method, and the results are shown in Table 14.By calculating and normalizing the results, we can obtain the image for each dataset in Test-1, as shown in Fig. 12.We see that RainCityscapes [42] obtains the highest score, while Rain100L obtains the lowest score.This result is because RainCityscapes is synthesized by Eqn.( 3), which has a more complex rain streak and atmospheric light circumstance.We also illustrate some deraining results of DetailNet in the first row of Fig. 11, from which we see that DetailNet [23] can remove most rain streak information in Rain100L and MPID, but hardly remove the rain streak in the datasets like Rain800, Rain1400 and RainCityscapes.Note that these visual results are consistent with the test scores.

Test-2: Objective evaluation on real data
In this experiment, we evaluate the pre-trained model on a real scenario dataset (i.e., Real-Internet), and then evaluate the improvement compared with the original result.Since these authentic rain images have no corresponding clean ground-truth images, we choose four no-reference IQAs as the evaluation criteria.The results are shown in Table 14, from which we see that most of the numerical indices decline after rain removal (the lower the value, the better the effect), which is contrary to what we expected.We also illustrate some deraining results of DetailNet [23] in the sec- ond row of Fig. 11, from which we see that the pre-trained model on Rain100H can remove most rain streaks, while the pre-trained model on MPID can hardly remove the rain.According to the improvement of the four no-reference IQA, we calculate each dataset score of Test-2, as shown in Fig. 12.We see that Rain100H gets 0.06 while MPID gets 0.17, which is the complete opposite of visualization.There are two reasons for this phenomenon: 1) current mainstream no-reference IQA criteria are not suitable for the SID task [61,137]; 2) The difference between the synthetic and real data results in weak generalization power.Considering this fact, we also use subjective Test-3 to balance out the disadvantage of visual perception in Test-2.

Test-3: Subjective evaluation on real data
Since Test-2 neglects the visual perception of real images, Test-3 also explores the subjective evaluation based on the results.Like the testing process of [58], we decompose the perceptual quality of deraining into two dimensions: Clearness and Authenticity.Specifically, the clearness metric indicates how thoroughly the rain has been removed, while the authenticity metric indicates how realistic the derained image looks.We first manually selected 180 representative images from the derained results in Test-2 as a random library, and then 50 images will be randomly selected for each evaluator.Since we perform the pairwise comparisons rather than individual ratings, each selected image will result in 9 × 9 × 3 = 108 comparisons (9 datasets, 3 methods), and 50 images will be 5,400 comparisons.In this study, 50 volunteers aged from 20 to 35 take part in the test, half of whom are professionals with an academic background.After all the pairwise comparisons are ranked, we fit the Bradley-Terry [6] model to estimate the raw subjective scores for each dataset, as shown in Table 14.The final scores of Test-3 after regularization are shown in Fig. 12.We can see that Rain100H has 0.19 while MPID has 0.10, which is consistent with the second row in Fig. 11.

Test-4: High-level task evaluation on real data
Finally, we would like to investigate the effect of the deraining results on subsequent high-level tasks.In this study, an object detection task is performed as an example.The faster R-CNN [89] pre-trained on PASCAL_VOC_2007 [20] and the YOLO-V4 [5] pre-trained on MS COCO [8] are used for evaluating the performance.The mAP score is used as the quantitative evaluation metric.The object detection results are shown in Table 14, from which we see that only a few detection results are higher than the original detection results.That is because SID methods are not optimized for the task of object detection, and the deraining process might lose some discriminative and meaningful semantic information [82].We also calculate the scores of the datasets according to the improvement proportion of the detection results after rain removal, as illustrated in Fig. 12.We can see that the scores of the dataset are fairly average, except for Raincity and Outdoor due to the remnants of rain streaks.We also show some visualization results of the object detection task in the third row of Fig. 11, from which we see that in most cases there is a decrease in the detection performance after the rain removal in terms of detection category and detection accuracy.

Calculation of EOD
To ensure the diversity of data rank experiments and the reliability of EOD, we decided to retain the result of Test-2.After obtaining the results of the four tests, we can calculate the EOD score as follows: where 1 , 2 , 3 and 4 are the weights of each test. is the deraining performance of Test-( = 1, 2, 3, 4).Note that we conduct extensive experiments to determine the appropriate weight and finally choose each test's weight as 0.3, 0.1, 0.3, and 0.3, respectively.We set the weight 2 of Test-2 to 0.1 to reduce the unreliability of no-reference IQA.The calculated EOD scores on each dataset are shown in Fig. 13.We see that Rain800 obtains the highest score while Outdoor-Rain obtains the lowest score.The EOD values shown in Fig. 13 are comprehensively evaluated scores based on both synthetic and real data, objective and subjective metrics, low-level and high-level tasks, so it could help the researchers of interest to choose appropriate data for their tasks.The results of each test in Fig. 12 can also be used as reference.From the results, we can draw some conclusions: 1) to obtain state-of-the-art deraining performance, datasets like RainCityscapes, Rain800 and Outdoor-Rain will be more challenging; 2) to obtain better deraining performance in real scenarios, datasets like Rain100H, Rain800 and Rain1400 will be a better choices; 3) to obtain better performance on the object detection task, datasets like Rain100L, SPA-DATA and MPID can be more effective.
Remarks: The values of the weights in Eqn.(39) are critical and influence the rating of each dataset.In our paper, the ratio is determined experimentally and empirically.We re-display the information of the four tests in Table 15 and discuss the reasons for the selections from three aspects: (1) As for Matrix, we mainly focus on the object matrices, since they are useful and cheap for evaluation (such as PSNR/SSIM/mAP), and the ratios of objective and subjective are respectively set to 0.7 and 0.3.Test-2 and Test-3 evaluated the SID performance in real-world scenarios via objective and subjective ways, respectively.However, for Test-2, the no-reference IQAs (SSEQ/NIQE/BRISQUE/BLIIDS-II) may be not reliable [61], so we reduce the ratio of Test-2 to 0.1 and set the ratio of Test-3 to 0.3; (2) As for Data, we mainly consider the real-world scenario, since image restoration methods should be able to serve practical applications well.Hence, the ratio of realworld v.s.synthetic is set to 0.7 v.s.0.3 in our method; (3) As for Task, we focus more on the low-level vision task than the high-level vision task, since SID mainly solves the image restoration problem.As such, the ratios are set to 0.7 v.s.0.3 for low-level v.s.high-level tasks.
In this paper, we consider these different aspects, and finally determine and set such settings of for the four tests to obtain the ranking of every dataset.

Conclusions and Future Work
We have rethought the data-driven SID problem by providing comprehensive analysis and new division criteria from new perspectives.Specifically, we have re-examined the three factors (i.e., data, rain model and network architecture) and divided them under new and more reasonable criteria (i.e., general vs. specific, synthetical vs. mathematical, black-box vs. white-box).We also analyzed the relationships of the three factors from the effectiveness perspective of data and revealed two different solving paradigms (explicit vs. implicit) for the SID task.We further discussed the current mainstream data-driven SID methods from five aspects: training strategy, network pipeline, domain knowledge, data preprocessing and objective function.In addition, we conducted extensive data ranking experiments to evaluate and calculate the effectiveness of current SID datasets.According to in-depth analysis and test, we can draw the following remarks and hypotheses: (1) Based on the relations of the three factors and the proposed new criteria (see Sections 3 and 4), we can infer: • Most existing data-driven SID methods did not strictly follow Eqn.(21), which makes the solving paradigm of current SID methods not solid.From this viewpoint, Li et al.
[60] is a good example, in which the three factors are combined to make the solution process more explainable.More analyses are concluded in Subsection 4. 3 (2) Based on the analysis of the five aspects of current data-driven SID methods (see Section 5), we can infer that: • By analyzing from different aspects, we find current data-driven SID methods have some common preferences, such as the fully-supervised mode in training strategy, hybrid structure in network pipeline, random sampling in data preprocessing, and MSE/MAE in the objective function.
• Basic network structures can well handle the SID task, e.g., GAN, U-Net, and Encoder-decoder.Besides, SID methods are very fond of using specific attention modules to remove rain and restore the background effectively.As such, we suggest fresh researchers follow the basic network structure with specific attention modules.For more specific conclusions, please refer to the remarks in each subsection.
(3) Based on the vertical data ranking experiments over nine paired datasets (see Section 6), we can infer that: • The current mainstream SID datasets cannot enhance the high-level task for the large gap between synthetic and real rain.As such, it is recommended to collect or build datasets with similar distribution to the real rain.Note that SPA-Data [113] is a good example with a satisfactory score from our data ranking experiment.
• The existing no-reference IQA metrics are not suitable for evaluating the SID tasks.However, subjective evaluation consumes a lot of time and labor.Therefore, it is very urgent to develop a specific and effective no-reference IQA metric for evaluating the SID tasks in real scenarios.
• Data play an important role in all data-driven deraining tasks.As such, how to produce/select/generate more effective data is also an important topic in the future.It is recommended to design more effective rain models to generate rain images that can be close to real ones, and we also suggest exploring new and effective automatic rain generation modes, such as DerainCycleGAN [123] or VRGNet [111].
Despite the presented results and analysis of this investigation, there is still much space to be explored for the SID task in the future.We believe that improving the interpretability of the solving process will be beneficial to improve the deraining performance and applications of SID methods, which will be important directions for future research.We have verified the effectiveness of the data through extensive experiments, and we will design more experiments to study the effectiveness of the rain model and network architecture to explore more interpretability in near future.

Figure 1 :
Figure 1: The first row denotes different types of visibility degradation caused by rain, where (a) rain streak, (b) raindrop, and (c) rain accumulation.The second row shows the visual impact of rain on traffic and surveillance, namely, from the vehicle perspective (d), and monitoring perspective (e).

Figure 2 :
Figure 2: Illustration of representative general datasets for SID, where we also show some samples of image pairs.

Figure 3 :
Figure 3: Illustration of representative specific datasets for SID, where we also show some samples of image pairs.

Figure 4 :
Figure 4: Examples of paired data include the rain image and the corresponding clear image (the first and second in top row), and some types of side information (see others).

Figure 6 :
Figure 6: Examples of two different network architectures to learn to predict a clear image ̃ in different ways.Since JORDER learns ̃ and ̃ by a black-box way, there are no ̃ and ̃ in its learning process, i.e., (b).

Figure 7 :
Figure 7: The relationship among the three factors (i.e., data, rain model and network architecture) and optimizer (i.e., loss function) in the SID task.

Figure 9 :
Figure 9: Comparison of the three network pipelines used in the current data-driven SID methods.The pictures are directly adopted from the original papers.

Figure 10 :
Figure 10: The statistics of different categories of current mainstream data-driven SID methods.

Figure 12 :
Figure 12: The numerical evaluation results of the four tests (i.e., Test-1 to Test-4) in the data ranking experiment.

Figure 13 :
Figure 13: The values of EOD based on the four tests.

Table 1
A brief introduction to the four closely-related survey papers and our survey paper on SID.
low-rank restoration

Table 2
Mainstream paired datasets for the SID task were proposed from 2017 to 2022.* denotes the name of the dataset defined by us.• denotes that the dataset contains real-world paired images.
[1]o selected from the BSD[1], with the training set and test set containing 1,800 and 200 images, respectively.

Table 3
Examples of current data driven SID methods that adopt the explicit solving paradigm, where Gen. and Spe. in the data column denote general and specific data, respectively; Syn. and Math. in the model column denote the synthetical and mathematical rain model, respectively; W and B in the network column denote the white-box and black-box network architecture, respectively.

Table 5
The representative single image deraining methods proposed from 2020 to 2022.

Table 6
The representative single image deraining methods proposed from 2017 to 2019.

Table 7
Statistics of the training strategies in current SID methods.

Table 8
Statistics of the network pipeline used in current methods.

Table 9
Statistics of the domain knowledge used in SID methods.

Table 11
Statistics of the data preprocessing methods used in current data-driven SID methods.

Table 12
The statistics of the objective functions used in current SID methods.From the first row to the end row denote MSE, MAE, SSIM, Adversarial loss, K-L divergence, Cross Entropy loss, Gaussian Mixture Model, Gaussian Process, Quasi Sparsity loss, Autocorrelation loss, Total Variation, Triplet loss, Contrastive loss, Charbonnier loss, Dark Channel loss, Knowledge Distillation loss, and Margin loss, respectively.

Table 13
(39)ementation details and emphasis of the four types of tests.S/R, O/A, and L/H denote that these three methods (i.e., DetailNet, RESCAN, and LPNet) are tested on synthetic or real scenario data, evaluated by objective or subjective metrics, and evaluated on low-level or high-level tasks.The weight represents the percentage of the final EOD calculation in Eqn.(39).

Table 14
The numerical evaluation results of the data ranking experiment (from Test-1 to Test-4).

Table 15
Information of the four tests in the data ranking experiment.Matrix objective objective subjective objective objective Data synthetic real-world real-world real-world real-world Task Iow-level Iow-level Iow-level high-level Iow-level