HDEC-TFA: An Unsupervised Learning Approach for Discovering Physical Scattering Properties of Single-Polarized SAR Image

Understanding the physical properties and scattering mechanisms contributes to synthetic aperture radar (SAR) image interpretation. For single-polarized SAR data, however, it is difficult to extract the physical scattering mechanisms due to lack of polarimetric information. Time–frequency analysis (TFA) on complex-valued SAR image provides extra information in frequency perspective beyond the “image” domain. Based on TFA theory, we propose to generate the subband scattering pattern for every object in complex-valued SAR image as the physical property representation, which reveals backscattering variations along slant-range and azimuth directions. In order to discover the inherent patterns and generate a scattering classification map from single-polarized SAR image, an unsupervised hierarchical deep embedding clustering (HDEC) algorithm based on TFA (HDEC-TFA) is proposed to learn the embedded features and cluster centers simultaneously and hierarchically. The polarimetric analysis result for quad-pol SAR images is applied as reference data of physical scattering mechanisms. In order to compare the scattering classification map obtained from single-polarized SAR data with the physical scattering mechanism result from full-polarized SAR, and to explore the relationship and similarity between them in a quantitative way, an information theory based evaluation method is proposed. We take Gaofen-3 quad-polarized SAR data for experiments, and the results and discussions demonstrate that the proposed method is able to learn valuable scattering properties from single-polarization complex-valued SAR data, and to extract some specific targets as well as polarimetric analysis. At last, we give a promising prospect to future applications.

HDEC-TFA: An Unsupervised Learning Approach for Discovering Physical Scattering Properties of Single-Polarized SAR Image method is able to learn valuable scattering properties from single-polarization complex-valued SAR data, and to extract some specific targets as well as polarimetric analysis. At last, we give a promising prospect to future applications.

I. INTRODUCTION
S YNTHETIC aperture radar (SAR) image understanding is important but challenging. Different from the optical remote sensing sensors, SAR has special imaging mechanisms with an active microwave system. The large slant-range geometric distortions such as fore-shortening, layover, and shadowing effect in SAR images which are against human visual habits bring many difficulties in interpretation [1]. The pixel intensity in SAR images reflects the radar backscattering of objects on the ground, which is influenced by several factors, like polarization, resolution, look angle, and material and shape of objects. How these factors influence the radar backscatter requires a complicated physical model to explain, which is hard to derive [2].
Understanding physical properties of objects on the ground benefits SAR image interpretation. Polarimetric SAR (Pol-SAR) images are usually used for terrain and land-use classification due to the inherent characteristics of physical scattering mechanisms. Such polarimetric decomposition and classification approaches [3]- [6] were proposed to discover the physical scattering characteristics and contribute to distinguishing man-made targets from vegetated areas [7], built-up area extraction [8], vegetation discrimination [9], and so on. Meanwhile, the polarimetric information has also been combined with the spatial information of SAR images for interpretation [10], [11].
Nowadays, some very-high-resolution (VHR) SAR satellites have been launched (such as TerraSAR-X [12] and Gaofen-3 [13]), providing VHR SAR images with more spatial textual details but generally single-or dual-polarized. However, many SAR systems have to make a tradeoff between very high resolution and full polarization, which may result in a lack of physics concept for VHR SAR images. For single-polarized VHR SAR images, some studies have applied time-frequency analysis (TFA) approaches like subaperture decomposition 0196-2892 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information. [14], [15] to describe the variations of targets in different subband which reflects physical properties, but only some canonical targets were discussed. Most research studies focused on intensity information or the processed multilooked despeckled data which is more visually understandable to analyze the texture features [16], [17]. In this way, single-polarized VHR SAR images are regarded as "images" similar to optical ones, with electromagnetism characteristics being neglected. Deep learning has been widely developed in SAR remote sensing image applications recently, including SAR target detection and recognition [16]- [19], land cover and land use classification [20], and target simulation [21]. The deep convolutional neural network (CNN) (DCNN) performs well on learning hierarchical texture features from VHR SAR images automatically. Other than learning from image domain, can deep learning explore deeper in nonvisual information of SAR image for more knowledge?
Discovering the inherent physical properties from single-polarized VHR SAR images automatically with deep learning is starting to receive attentions. A few recent studies have addressed this issue to learn the physical signature or to reconstruct the polarimetric features from nonpolarized SAR data, with supervision of full-polarized images [22], [23]. In this article, however, we propose an unsupervised deep learning method of hierarchical deep embedding clustering (DEC) (HDEC) based on TFA (HDEC-TFA) to study the physical scattering properties for high-resolution single-polarized complex-valued SAR data from time-frequency perspective. The main idea is inspired by studies about TFA on SAR data [15], revealing that different targets would present various kinds of backscattering behaviors when being observed in subbands.
The subband scattering pattern is extracted for each pixel position with 2-D continuous subband decomposition, which represents the backscattering variations in different azimuth angles and range frequency bandwidths. The processing provides a large amount of data for deep learning. In order to distinguish targets with different physical properties automatically, we propose the HDEC algorithm for implementation in an unsupervised manner which can learn the embedded feature representations and cluster centers simultaneously. The difficulty of evaluating the result lies in the absence of ground truth. Although the polarimetric analysis provides different types of scattering mechanisms with our method, there is some correlation between them [15], [24]. Consequently, we take full-polarized SAR images for experiments and use the polarimetric result as reference data in order to help understand the scattering behavior classes in single-polarized data. For quantitative analysis, an information theory-based evaluation method is proposed. In order to illustrate the significance in extracting valuable scattering information from single-polarized SAR data of our proposed method, some particular targets are demonstrated and discussed.
The contributions of this article are as follows. 1) For every target in single-polarized SAR images with complex values, a subband scattering pattern is extracted with TFA approach which reveals the backscattering variations along range and azimuth directions. It is regarded as a representation of the target physical scattering property. 2) A novel unsupervised deep learning approach (HDEC-TFA) is proposed for single-polarized SAR images to discover the latent features of subband scattering patterns and generate a visualized scattering map to give an intuitive understanding of physical scattering properties. 3) A novel evaluation method based on information theory is proposed to demonstrate the correlation and differences between the proposed method and polarimetric analysis. 4) Some specific objects and land covers are analyzed and discussed to demonstrate the significance of learning valuable scattering properties with the proposed method, in comparison with polarimetric analysis. In addition, the generalization ability of trained deep model is discussed with different SAR data. In Section II, we will introduce TFA and polarimetric analysis methods. The proposed HDEC-TFA method is presented in Section III. Sections IV and V present the experimental results and discussions, respectively. Finally, the conclusion is given in Section VI.

II. TIME-FREQUENCY ANALYSIS AND POLARIMETRIC ANALYSIS
In this section, the related work of TFA for single-polarized SAR images which inspire us is introduced, and then the polarimetric analysis method applied in this article to generate the reference data is given. Also, the correlation and combination between the two types of analysis methods are demonstrated.

A. Time-Frequency Analysis for Single-Polarized SAR Images
In previous studies, TFA in the azimuth direction has been applied for subaperture decomposition to process a set of subaperture images containing different parts of the SAR Doppler spectrum, with reduced resolution though. For some man-made objects with particular backscattering phenomena, the azimuth subband decomposition provides useful information for SAR image understanding due to the differences among different looks of a synthesized antenna. The subaperture analysis has been used for moving target detection [14], [25], urban area analysis [14], [26], and characterizing the scattering behavior of targets [27]. For high-resolution SAR, the antenna transmits linearly modulated chirp signals with a large bandwidth spectrum in range direction. The subbands decomposition in range frequency domain allows to obtain a series of scene reflectivity behaviors at different observation frequencies which have been used to detect and characterize objects with frequency sensitive responses [28]- [30]. Also, TFA in two-dimension has been proposed to process the extended 2-D SAR image spectrum with chirp bandwidth in range and Doppler bandwidth in azimuth, aiming at extracting backscattering variations from the 2-D frequency spectrum and characterizing the target properties [15], [28], [31], [32].
Several TFA methods have been applied in SAR target analysis, such as 2-D Gabor transform [33], Matching-Pursuit algorithm [34], Multidimensional Continuous Wavelet Transform (MCWT) [35] and short-time Fourier transform (STFT) [15]. TFA extends the SAR image in spatial domain to time-frequency domain, making it possible to reveal the backscattering diversity versus range and azimuth frequencies unseen in SAR images. The extension also makes data volume multiplied which inspires us to use deep learning algorithms for discovering the inherent patterns of scattering behaviors.

B. Polarimetric Analysis and GD-Wishart Classification
The polarimetric analysis is based on the scattering and physical modeling in polarimetric systems, and not data specific in principle. There are many researches concerning the coherent or incoherent decomposition from the scattering matrix of PolSAR data in order to analyze the physical scattering mechanisms and understand the land surface. H /A/α decomposition approach [3] proposed by Cloude and Pottier creates an H /α plane for classification. Freeman and Durden [36] proposed the Freeman-Durden scattering power decomposition approach and Yamaguchi et al. [5] added the helix scattering term to Freeman's method. Based on the polarimetric decomposition, Lee et al. [4], [37] proposed a well-known unsupervised classification approach using complex Wishart classifier.
In this article, to obtain the reference data of polarimetric characterized scattering mechanism map, we apply a newer polarimetric analysis method based on Kennaugh matrix (K ) and geodesic distance Wishart classification (GD-Wishart) proposed by Ratha et al. [38].
For a full-polarized SAR image, the 2×2 complex scattering matrix S containing complete polarimetric information about backscattering from a target can be described as The Kennaugh matrix K [39] which preserves the scattering information can be obtained from S by and ⊗ denotes the Kronecker product, j = √ −1. The characteristic scattering mechanisms odd-bouncing (O), double-bouncing (D), and volume bouncing (V) scattering are modeled with a trihedral corner reflector, a dihedral corner reflector, and a cloud of uniformly distributed randomly oriented dipole scatterers, respectively. Defined by K matrices, the corresponding K O , K D , and K V are given by Then a normalized scattering similarity measure w based on geodesic distance [40] between the elementary target (K O , K D , and K V ) and the observed K matrix is computed and modulated by SPAN, the total received power from the four polarimetric channels that The geodesic distance between two K matrices is given by As a result, the normalized scattering similarity w i for each i ∈ {O, D, V } is defined as An initial category i ∈ {O, D, V } is defined according to the dominant w i for each pixel. By dividing and merging clusters for each category, Wishart classification [4] is then applied iteratively with w i as the input for each pixel and a final scattering category belonging to odd-bounce, double-bounce, and volume-bounce is given. In [38], pixels with nearly equal w i can be flexibly reclassified to any category of {O, D, V } by Wishart classification. In our work, however, those flexible pixels are fixed as a new scattering class, mixed bounce M, and are not allowed to change to other three categories during Wishart classification.

C. Correlation and Combination Between Time-Frequency Analysis and Polarimetric Analysis
Although TFA and polarimetric analysis are based on different theories and reveal different physical characteristics of SAR, some literature indicated the correlations between them.
In [24], the authors discussed the point target behaviors in high resolution SAR images with both TFA and polarimetric analysis methods. Spigai et al. [15] defined four elementary possible backscattering behaviors: 1) frequency-invariant that the backscattering is stable in azimuth and range direction; 2) range-variant that the backscattering varies along the range direction while stable in azimuth direction; 3) azimuth-variant that the backscattering varies along the azimuth-variant while stable in range direction; and 4) 2-D-variant that there is no specific patterns of backscattering in both range and azimuth directions. According to this work, they demonstrated the possible relation between odd-double bouncing detected by polarimetric analysis and frequency-invariant/azimuth-variant behavior detected by TFA, since the frequency-invariant target is either a trihedral or dihedral on the rough ground.
TFA brings more extra information for man-made targets, describing the backscattering behaviors for different azimuth angles of observation and frequencies of illumination. As a result, some research studies combine the TFA and polarimetric information to analyze urban cities [33]- [35].
To our knowledge, the polarimetric analysis has developed for years with abundant studies toward the scattering mechanisms. In comparison, the backscattering behavior patterns of objects derived from TFA are less explored and only some particular targets with elementary backscattering pattern are discussed. Since the latent correlation between TFA and polarimetric analysis, we generate the polarimetric characterized scattering mechanism result as the reference data for evaluation in this article.

III. HIERARCHICAL DEEP EMBEDDING CLUSTERING WITH TIME-FREQUENCY ANALYSIS
A novel unsupervised deep learning method based on TFA (HDEC-TFA) for single-polarimetric SAR images in order to generate the backscattering behavior map is proposed in this section.

A. Workflow Overview
The main target is to discover the latent characteristics of subband scattering pattern and assign specific labels to pixels, with a visualized backscattering behavior map generated finally to provide an intuitive result. We summarize the overall workflow of the proposed method in this section and the flowchart is shown in Fig. 1. Details will be explained in Sections III-B-III-E.
1) For quad-polarized SAR images, apply polarimetric analysis method (namely GD-Wishart for short) [38] to obtain the polarimetric characterized scattering mechanism map. Odd-bouncing (O), double-bouncing (D), volume-bouncing (V), and mixed scattering (M) are predicted for every pixel. 2) Take the HH channel image and generate the subband scattering pattern for each pixel with 2-D continuous subband decomposition based on TFA theory.

3) Design and train a deep CNN (CNN) and cluster
centroids for subband scattering patterns with DEC algorithm in an unsupervised manner. Thus, an initial backscattering behavior output map is obtained. 4) Evaluate the backscattering behavior output map and compare it with the polarimetric characterized scattering mechanism result. Select some clusters for a further study. 5) For the selected clusters, perform 3) in a multitask learning manner to obtain a series of subclusters.

B. 2-D Continuous Subband Decomposition for Single Polarized SAR Data
We demonstrate the TFA approach to generate the subband scattering pattern in Fig. 2. The basic idea is based on short-time Fourier transform (STFT) and bandpass filtering.
where FFT denotes the Fourier transform.
In frequency domain, a series of bandpass filters w( f i r , f j a ) centered on frequency pairs {( f i r , f j a )} both in range and azimuth directions are applied on S, and then those filtered signals are transformed back to spatial domain to obtain a series of subband images that where · and FFT −1 denote the dot product and inverse Fourier transform, respectively. Here, we call them subband images which are decomposed in both range and azimuth directions to distinguish from subaperture (or azimuth subband) decomposition in the literature [14], [25]- [27]. These subband images are with lower resolutions in spatial domain but more evidently reflect the backscattering variations of objects on the ground, especially for man-made targets which could be visible in some subbands but invisible in other subbands [14]. Hence, we pick up the center position (x 0 , y 0 ) of eachs to generate the radar spectrogram for the pixel (x 0 , y 0 ) in the SAR image that For better visualization and further processing, we denote the amplitude of r (x 0 , y 0 ) as |r | and define it as subband scattering pattern in the following, shown in Fig. 2. Recording the backscattering intensity information in every subband of range and azimuth, the subband scattering pattern enables to reveal the instability of backscattering with various look angles and frequency responses [15]. We select several canonical targets in an SAR scene of Sentinel-1 and show the corresponding backscattering behaviors in Fig. 3. Four elementary backscattering behaviors defined in [15], frequency-invariant, 2-D-variant, azimuth-variant, and range-variant, are presented together with the corresponding objects A (street lamp), B (bare-ground), C (ship), and D (industrial building roof). Besides the four representative subband scattering behaviors, however, there could be more complicated backscattering behaviors in real SAR scene images, especially in urban areas such as objects E and F (factory facilities). As a result, aiming at discovering the characteristics of various backscattering behaviors, we propose the hierarchical deep embedding learning method in Section III-C.

C. Unsupervised Hierarchical Deep Embedding Clustering
Based on the analysis in Section III-B, subband scattering patterns are generated in pixel scale which provide a big data volume for a complex-valued SAR image, inspiring us to apply the data-driven approach to discover the latent features for each pixel and assign the label it belongs to. Similar to the polarimetric analysis, we also want to obtain a backscattering behavior map to demonstrate the physical properties of objects on the ground. Without any prior knowledge and information in this field, a completely unsupervised deep learning workflow is proposed based on DEC.
DEC [41] was proposed to simultaneously learn the feature representation and cluster assignments with deep neural networks. The main idea is to map data to a lower-dimension space and iteratively optimize the cluster centers with an auxiliary target distribution derived from the initial soft cluster assignment. Here, we will demonstrate our HDEC workflow specific to subband scattering patterns.
In the first step, a denoising stacked convolutional auto-encoder (DSCAE) is designed for learning the latent features of subband scattering patterns in an unsupervised manner. The encoder part contains four convolutional block, each with a convolutional layer, a batch normalization layer, an activation layer, and with or without a max-pooling layer, as shown in Fig. 4. Since the scattering patterns are less complicated in texture than normal natural images, it is not necessary to train a very deep neural network with millions of parameters. We denote G F (θ ) as the encoder part of DSCAE and G F (θ) as the decoder part. Given a scattering pattern |r (x 0 , y 0 )| of the pixel (x 0 , y 0 ), G F projects it to a latent feature space φ = G F (|r |) ∈ R n . The optimization is conducted by minimizing the mean square error between the output of the decoder |r| and the input of the encoder |r | that The DEC training diagram is demonstrated in Fig. 5. In the beginning, k cluster centers {μ j } k j =1 are initialized by k-means clustering [42] performed on the latent feature space. For an input subband scattering pattern |r i |, to measure the similarity between latent feature φ i and centroid μ j , a soft assignment is given based on the Student's t-distribution with the degrees of freedom setting to 1 that The soft assignment q i j can be considered as the probability of assigning data |r i | to cluster j . In order to refine the cluster centroids iteratively, a Kullback-Leibler (KL) divergence distance between the soft assignment Q and the target distribution P is given for optimization where p i j is defined as During the training process, cluster centroid μ and parameters θ in G F are simultaneously updated with the gradients of Loss kl with respect to μ and φ, respectively. In testing step, the cluster label j of each subband scattering pattern |r i | is projected to the corresponding position (x i , y i ) to obtain the result map. In this step, the input subband scattering pattern |r | is averaged by a small window with a size of 3×3 to make the result map more smooth.
For a hierarchical training scheme, some selected clusters obtained from level-1 training are taken for level-2 training. The selection strategy will be introduced in Section III-D based on the proposed evaluation approach. In level-2 training, a multitask learning strategy is applied to share the parameters of first two layers in G F but to calculate and update the cluster centroids μ independently when training different subclusters. Thus, these subtraining tasks can be conducted in parallel and prevent over-fitting in feature learning.
The reason of conducting a hierarchical training scheme is that we found the subband scattering patterns for some natural surfaces are with no specific characteristics, which make their features much more various and scattered than other targets. As cluster number increases, the increased clusters are with little significance for us to understand. These clusters are defined as "non-stationary classes" in this article. We also found the "non-stationary classes" are less related to any scattering mechanism obtained by polarimetric analysis. Consequently, we select the "stationary classes" for level-2 training to discover more information compared with polarimetric result.
In the next section, we will introduce the evaluation approach and the principles for selecting clusters for level-2 training.

D. Information Theory Based Evaluation Approach
In this article, the method of learning the physical scattering properties from single polarized SAR images is first put forward in an unsupervised manner. It is difficult to generate the ground truth of backscattering behaviors, since besides four characteristic backscattering behaviors defined in [15], there would still be various of possible backscattering behaviors in SAR images, which are complicated to figure out. Polarimetric analysis method (denoted as GD-Wishart) introduced in Section II-B provides a perspective of physical scattering mechanisms for polarimetric SAR images, including oddbounce, double-bounce, volume-bounce, and mixed scattering. With a full-polarimetric SAR image, it is possible to generate a polarimetric characterized scattering mechanism map by polarimetric analysis method. However, we would not consider it as the ground truth, since the scattering properties detected by polarimetric analysis and TFA are somehow different. Serving it as the reference data, we propose an information theory-based evaluation approach here to demonstrate the effectiveness of our proposed method in learning meaningful physical properties from single polarized SAR images and present a quantitative analysis to describe the relationship with polarimetric analysis result.
Denote Y 1 and Y 2 as variables of labels predicted by GD-Wishart and HDEC-TFA. Suppose there are y 1 labels in GD-Wishart and y 2 labels in HDEC-TFA, and basically y 1 = y 2 . Here, we consider the prediction from an SAR image to a scattering map as an information transmission system, transforming the information in each position (x k , y k ) to a scattering label. For GD-Wishart method, the information in position (x k , y k ) is the polarimetric scattering matrix S while for HDEC-TFA it represents the subband scattering pattern |r |. In [43], a framework for supervised classification performance evaluation with information-theoretic method was proposed, using information-theoretic measures to quantify the information transmitted from true to estimated labels. Here, we borrow the idea to measure the information shared by HDEC-TFA for single-polarized and GD-Wishart for full-polarized SAR data.
For a specific scattering mechanism class i in Y 1 and undefined backscattering behavior class j in Y 2 , the more information that i contains about j (or j contains about i ), the more similar physical significance j represents with i . As a result, we use point-wise mutual information (PMI) for quantification. The PMI between i and j is defined as where P Y 1 Y 2 (i, j ) denotes the joint distribution of label i and j , P Y 1 (i ) and P Y 2 ( j ) denote their marginal distributions, and P Y 1 |Y 2 (i, j ) denotes the conditional distribution. Intuitively speaking, if i and j are mutually exclusive, then increases. Therefore, i and j are mostly positive correlated when PMI Y 1 Y 2 (i, j ) reaches the maximum.
To obtain the PMI matrix for each (i, j ) pair, the empirical where δ (i, j ) y 1 k , y 2 k = 1, y 1 k = i and y 2 k = j 0, others.
Thus, the empirical estimate of joint distribution and its marginals can be defined asP . According to (17), we can obtain the empirical estimate of PMI matrix PMI Y 1 Y 2 (i, j ). Since we only concern about the positive correlation between i and j and also in order to avoid the negative infinity of PMI, the calculated PMI is cut off to 0 in case of negative values.
After the level-1 training, each cluster is visualized and the evaluation matrix PMI (1) is calculated. In order to discover more meaningful and refined classes, we select some "stationary" clusters with significant meanings for a further study, which we define with the help of a PMI matrix in a quantitative way. For class j in DEC clusters Y 2 , if there are more than one label i in polarimetric analysis classes Y 1 subject to PMI Y 1 Y 2 (i, j ) > ε where ε is a threshold, then the j cluster is continued to train the subclusters. In this article, we decide ε as 1. The evaluation PMI matrix for the final result is generated after level-2 DEC training, denoted as PMI (2) .
In experiments and discussions, the potential correlation could be found between the so-called stationary classes in level-1 DEC training and the GD-Wishart result. In principle, the selected clusters represent concentrated objects with significant semantic meaning, such as man-made targets, water bodies, and roads. On the other hand, the targets in non-stationary classes are usually distributed. Consequently, we could empirically select the stationary classes with visualized scattering map.

E. Colorization
To make the backscattering map more understandable, colors for each cluster can be set based on the evaluated PMI matrix with polarimetric labels. For polarimetric analysis visualization, we assign blue, red, green, and brown colors for oddbounce, double-bounce, volume-bounce, and mixed scattering classes, respectively. First, the evaluated PMI matrix in level-1 learning decides the main color tone of each level-1 cluster and the corresponding level-2 subclusters (if existed). For j (1) in level-1 DEC clusters, if PMI (1) (:, j (1) ) > ε and the largest value in PMI (1) corresponds to only one kind of scattering mechanism, for example, only odd-bounce scattering, then j (1) is set as blue color. It ensures the areas with strong correlation between HDEC-TFA and GD-Wishart look similar in the result map. On the other hand, if PMI (1) (:, j (1) ) < ε which denotes little correlation between j (1) and any scattering mechanism of polarimetric analysis, a darker color is assigned for j (1) . Similarly for level-2 subclusters, we decide the specific color for each subcluster j (2) based on the main color tone of its level-1 label j (1) and the largest values in PMI (2) (:, j (2) ).
The aforementioned colorization rules based on quantitative PMI evaluation method are set for a better comparison with polarimetric scattering map. Generally speaking, the most learned classes can be related to specific objects. Following the universal principles that green-like colors for natural lands, red-like colors for man-made targets, blue-like colors for water bodies, etc., we can make color coding empirically without full-polarized reference data. To emphasize, the generated visualization map is only for an intuitive understanding, and the learned scattering properties are the most important things for helping SAR image interpretation applications in the future.

A. Data Description
We propose this method aiming at learning backscattering behaviors of single polarized SAR images to help understand the physical properties of objects on the ground when  polarimetric information is unavailable, especially in very high resolution SAR images. However, in order to demonstrate the effectiveness of our method and compare the result with polarimetric analysis, we choose the full-polarimetric SAR data of GaoFen-3 (GF-3) in our experiments. GF-3 [13] is a multipolarization SAR mission working in C-band, which provides 12 observing modes and with single-, dual-, and quad-polarization. Here, we use the Quad-pol Stripmap products in San Francisco, USA, and Paris, France, which is available online [44]. With a resolution of 8 m, the Quad-pol SAR images provide HH/HV/VH/VV channels in complex values. The parameters of the two SAR scenes are specifically shown in Table I. In order to discover the physical scattering characteristics of objects mainly in urban areas, an area with the size of 2048 × 1800 pixels in San Francisco and the other area with the size of 2584×3120 pixel in Paris are cropped as the experimental data. The regions in Google Earth and their corresponding pseudo-colored Pauli images are shown in Fig. 6.

B. Experimental Setup
The TFA analysis window size of complex-valued SAR image is set as 32 × 32 and the continuous bandpass filters group is with a bandwidth of half full-aperture bandwidths both in range and azimuth. In polarimetric analysis GD-Wishart algorithm and the proposed HDEC-TFA method, the K matrix and subband scattering pattern |r | are both averaged by a window of 3 × 3.
The cluster number k in HDEC-TFA method should be considered in practice. According to the previous studies [14], [15], four elementary backscattering behaviors can be derived from TFA and practically there could be more complicated behaviors in SAR images. To this end, k should be greater than 4 in the level-1 DEC training. In order to set up a proper cluster number, we have tested k from 5 to 10 by cropping a relatively small area of 512 × 512 pixels for experiment. Fig. 7 shows the training results of each cluster for different k values. The blue and red borders in row mark the positive and negative additional clusters, respectively. We found that as k becomes much larger than 4, clusters about man-made targets (such as column 1 and 2) are beginning to stabilize but natural targets related clusters (such as column 5 and 7) keep splitting into more clusters. To our knowledge, the subband scattering pattern of SAR data provides abundant information of man-made objects due to the specific physical properties like resonant structure while little extra information of natural targets. As a result, we set up k of level-1 DEC training to 6 or 7 in our experiments to make sure the algorithm could separate particular backscattering behaviors as far as possible with less computation load. For a further study, we decide k as 3 in the level-2 DEC training.
The DSCAE and HDEC training is implemented by stochastic gradient descent (SGD) optimizer, with weight decay 0.0005 and momentum 0.9, and a fixed learning rate of 0.01 and 10 −4 , respectively. All experiments are conducted on a workstation of 64 bit Win7 operating system, with 64G RAM and NVIDIA Quadro M2000M graphics card of 4 GB GDDR5 VRAM clocked at 1250 MHz.

C. HDEC-TFA and GD-Wishart Results by Visualization
The GF-3 Quad-Pol SAR images in Paris and San Francisco are explored here.
1) Paris: Taking the full-polarization SAR image with HH/HV/VH/VV channels, the scattering mechanism output map of polarimetric analysis is obtained by GD-Wishart algorithm, as shown in Fig. 8(a). By dividing and merging clusters and then reclassifying, GD-Wishart algorithm divides  the pixels into 14 scattering mechanism classes in total, including two clusters of odd-bounce with blue colors, three clusters of double-bounce with yellow/brown colors, four clusters of volume-bounce with green colors, and five clusters of mixed-bounce with red colors. That is Y 1 = {O1, O2, D1, D2, D3, V1, V2, V3, V4, M1, M2, M3, M4, M5} and y 1 = 14. In Fig. 8(a), O1 represents very typical odd-bounce related to water bodies and soil ground. Volume-bounce is common in the situations of thick vegetation and extremely rough surfaces, like very high density area of high buildings. V 1 is mainly happened in vegetation region while V 3 and V 4 are more related to high density urban areas in different degrees. It seems that M2 is more likely in some natural areas such as vegetation and soil ground while M1, M4, M5 and Di correspond to some urban areas where double-bounce would probably happen. For the HH channel single-polarization SAR data, we set up seven clusters in the level-1 DEC training that Y 2 level−1 = {a, b, c, d, e, f, g} and y 2 level−1 = 7. After that, class a, c, e, and g are sent to level-2 training to obtain the subclusters so that Y 2 level−2 = {a1, a2, a3, b, c1, c2, c3, d, e1, e2, e3, f , g1, g2, g3} and y 2 level−2 = 15. The final backscattering behavior map is shown in Fig. 8(b). By comparing Fig. 8(a) with (b), we see that the results of polarimetric analysis and HDEC-TFA are visually similar.
2) San Francisco: The experiments on San Francisco and Paris are done independently considering various urban structures in two cities. As a result, different clusters are obtained in this SAR image.
The polarimetric analysis output map of San Francisco area is shown in Fig. 9(a). With five clusters in odd-bounce with blue colors, two clusters in double-bounce with yellow colors, two clusters in volume-bounce with green colors and three clusters in mixed scattering with red colors, the total number of scattering mechanism classes by GD-Wishart is 12 that Y 1 = {O1, O2, O3, O4, O5, D1, D2, V 1, V 2, M1, M2, M3} and y 1 = 12. In Fig. 9(a), O4 and O5 are obvious in mainly representing the water areas (river, lake, and sea), and some smooth ground surfaces. However, the other three clusters of odd-bounce O1, O2, and O3 mainly happen in urban areas that correspond to some man-made targets. For volume-bounce, it is easy to tell that V 1 represents the volume-bounce in high-density building areas while V 2 represents the volume-bounce mainly in vegetation areas. Di and Mi mostly happen in residential areas and some other man-made targets.
With a particular urban planning with regular street blocks and similar structures, the backscattering in San Francisco is less complicated than Paris, so we apply k = 6 in the first level DEC training that Y 2 level−1 = {a, b, c, d, e, f } and y 2 level−1 = 6. Then b, c, d, and f classes are applied in the second level training to obtain 14 clusters finally that Y 2 b1, b2, b3, c1, c2, c3, d1, d2, d3, e, f 1, f 2, f 3, g} and y 2 level−2 = 14. The final result is shown in Fig. 9(b).

A. HDEC-TFA Results Discussion With Quantitative Evaluation
In this section, the evaluated PMI matrixes shown in Figs. 10-13 are analyzed to have quantitative discussions of the HDEC-TFA results.
1) Paris: Taking Paris data as an example, we demonstrate the way to select classes in level-1 training result to the next training level based on quantitative analysis of PMI matrix. In level-1 training, with y 1 = 14 and y 2 level−1 = 7, the obtained level-1 PMI matrix with the size of 14 × 7 is shown in Fig. 10. PMI (1) (D2, e) has the highest value of 5.0367 which indicates that class e is most related to D2. The following PMI (1) (D1, e), PMI (1) (D3, e), and PMI (1) (V 4, e) with values of 2.9853, 2.9654, and 2.7004, can also verify that class e contains certain information in double-bouncing classes D1, D3, and volume-bouncing class V 4. Class a can be easily associated with O1 with only PMI (1) (O1, a) of 2.6693 is greater than ε and the others are 0 or relatively small. Class c and g are more complicated since several PMI (1) (:, c) and PMI (1) (:, g) have significant value. Class b, d, and f are little related to any scattering class of polarimetric analysis with small PMI value that PMI (1) < ε. Therefore, we only take class a, c, e, g to the next step with level-2 DEC learning, obtaining three subclusters in each class.
With y 2 level−2 = 15 in the final result, the PMI matrix with the size of 14 × 15 is shown in Fig. 11. Interestingly, some classes in level-1 training have become more distinct to be related with a specific scattering mechanism class of GD-Wishart result after a further subtraining. For example, PMI (2) (D2, e3) = 6.4512 has absolute advantage compared with the following largest values PMI (2) (V 4, e3) = 1.4669 and PMI (2) (O1, e3) = 0.1887. The hierarchical learning scheme of HDEC-TFA splits the cluster of interest (almost related to man-made targets) more characteristic and mostly increases the dominant PMI value with GD-Wishart classes. PMI (2) (O1, a3) = 3.4582 is 0.8159 higher than PMI (2) (O1, a) = 2.6693, while PMI (2) (M2, a3) = 0.4286 has been reduced by 0.404 compared with PMI (2) (M2, a) = 0.8326. With a quantitative analysis, we find that the hierarchical learning scheme of HDEC-TFA is more effective in extracting useful and understandable backscattering classes.
We select the subcluster a3, c3, e3, and g1 in each class a, c, e, and g of HDEC-TFA result with the highest PMI value to demonstrate the connection between the polarimetric scattering mechanism derived from full-polarized data and the backscattering behaviors derived from single-polarized data. Fig. 14 shows some subband scattering patterns of randomly chosen pixels in a3, c3, e3, and g1 class. As defined in Section II-A, four elementary scattering behaviors of 2-D-variant, range-variant, azimuth-variant, and frequencyinvariant, can be related to these four subclusters. a3 is most associated with O1 due to the maximum PMI value that PMI (2) (O1, a3) = 3.4582, as shown in Fig. 11, with 2-D-variant backscattering behaviors. Since the 2-D-variant spectrograms show variations along both range and azimuth  directions without specific characteristics and happen almost in natural targets, a3 in Fig. 8(b) is highly related to the river and soil ground. Correspondingly, these areas usually reflect single bouncing mechanism from the surfaces as O1 represents in Fig. 8(a). Second, the frequency-invariant targets have isotropic backscattering pattern with stable spectrograms with respect to both range and azimuth frequencies as e3 represents, related to trihedral or dihedral above rough soil. On the other hand, the double-bounce could be modeled by a dihedral corner reflector. In Fig. 11, the highest PMI value that PMI (2) (D2, e3) = 6.4512 demonstrates e3 can be strongly associated with one kind of double-bounce D2.
2) San Francisco: The PMI matrices of level-1 training and the final result of San Francisco data are shown in Figs. 12 and 13. In this case, class f in level-1 DEC training is highly associated with multiple scattering mechanisms of GD-Wishart, like odd bouncing O2 with PMI (2) (O2, f ) = 5.0944, double bouncing D1 with PMI (2) (D1, f ) = 4.8441, and mixed scattering M2 with PMI (2) (M2, f )   correspond to. In polarimetric analysis, odd-bouncing O2 and double-bouncing D1 are related to some particular man-made targets like roof and shipyard, as shown in Fig. 16. The high PMI values of (O2, f ) and (D1, f ) demonstrate that these particular targets can also be detected by unsupervised HDEC-TFA method in class f , as pink pixels represent in Fig. 16.
Although different principles HDEC-TFA and GD-Wishart follow, there are still some similarities and correlations between two methods. Our proposed HDEC-TFA approach on single-polarized SAR images has the ability in learning some meaningful physical properties from complex-valued SAR images.

B. Specific Targets Discussion
In this section, we select some specific targets, including man-made objects and natural objects for discussion.  As shown in Fig. 17 of San Francisco area, three building roofs marked as A, B, and C are with similar structure that an inclined plane forms an acute angle to the ground. For the GF-3 SAR sensor of an incident angle of about 19-22 • , it probably reflects the odd-bouncing mechanism. As a result, the polarimetric analysis result verifies our conjecture that target A and some parts of target B are detected as odd bouncing O2. However, the scattering mechanism of target C in polarimetric analysis result is not clear. On the other hand, we find that these similar structures can be easily detected by HDEC-TFA with only one HH channel SAR image. It indicates our proposed method could have certain advantage in revealing some particular structures which is helpful to understand the complicated urban areas in high-resolution SAR images where polarimetric information is not available.
Meanwhile, the Eiffel Tower and three other targets in Paris area which are railway station, city block, and high building,  with their HDEC-TFA results and GD-Wishart results, are shown in Figs. 18 and 19. It can be seen that for some particular man-made targets, the unsupervised HDEC-TFA method on single polarized SAR data can achieve a similar detection result compared with polarimetric analysis method on full-polarized data. However, there are also some differences between them. For example, in high building area, polarimetric analysis can detect the volume bouncing (light green pixels in Fig. 18 and Eiffel Tower in Fig. 19) which is different from other lower buildings or residential houses. While HDEC-TFA on single-polarized SAR data may consider them the same class of backscattering behaviors because of the similar structure and material.
Next, we select some vegetation and water body areas in Paris and San Francisco to compare the components of both scattering mechanism types and backscattering behavior classes to see if there is any connection even the PMI values are very small. A patch with a size of 100 × 100 is cropped  for analysis and the percentage of pixels in each class is calculated. The cropped areas A, B, C, and D from Paris and San Francisco are marked in Figs. 8(a) and 9(a), respectively. Although Fig. 10 shows that the PMI values of PMI (2) (:, b), PMI (2) (:, d), PMI (2) (:, f ) are very small, the percentage of class b and d is similar with volume bouncing V 1 and mixed bouncing M2, respectively, as shown in Fig. 20. For natural targets such as water, vegetation, and agriculture, TFA provides subband scattering patterns without specific pattern, compared with man-made targets. Unlike the polarimetric analysis which is usually used to classify vegetation types, the HDEC-TFA method on single-polarized SAR data could not bring much information on it. By visualizing the class d of San Francisco in level-1 training, we found that class d contains not only some vegetation areas, but also the roads and rivers. To distinguish better, class d is sent into the second level DEC training to obtain d1, d2, and d3. From PMI matrix in Fig. 13, we found that class d2 is more associated with odd bouncing O5 compared with d1 and d3, corresponding to the rivers and roads. While class d1 and d3 are more related to vegetation areas. From Fig. 21, it is a little difficult to correspond the result of GD-Wishart and HDEC-TFA by comparing the percentage of pixels in each class. We would rather say it seems like class a, e, d1, and d3 can be related to some vegetation areas where volume-bouncing V 2 represents.

C. Data-Driven and Theory-Driven Discussion
The above-mentioned polarimetric analysis method is based on the well-established radar polarimetry theories and can be generally applied to different SAR images. The absence of polarimetric information makes it difficult to use these theory-driven approaches to interpret the polarimetric features for SAR images. In order to reconstruct the polarimetric information from non-pol SAR images, Song et al. [22] proposed a data-driven deep learning method under the supervision of the polarimetric features extracted from full-pol SAR images. Recently, Zhao et al. [23] also proposed a data-driven method based on complex-valued CNN to extract the polarimetric features from single-pol SAR data, with ground-truth labels generated automatically according to H-alpha division plane.
The proposed HDEC-TFA algorithm in this article is different from the above-mentioned data-driven approaches, even though all consider taking the single-polarized SAR data for training. On the one hand, it is an unsupervised deep learning algorithm without ground-truth labels derived from full-pol SAR data so that the learned physical properties are not polarimetric based. The experiments have demonstrated the correlation between them only to clarify the physical significance of the learned classes. On the other hand, the proposed learning method is partly theory-driven where the 2-D continuous subband decomposition with TFA approach provides the scattering behavior of targets in different azimuth angles and range chirp bandwidths, and partly data-driven where the HDEC learns to discover the inherent patterns and assign labels automatically. The theory-driven TFA brings explainable physical perspective for objects which is intrinsic for complex-valued SAR images so that the proposed HDEC-TFA learning method can be applied to any other single-polarized SAR images. Meanwhile, the data-driven method generally makes the trained model data-specific. The following experiments will discuss if the trained model could be generalized to different cases.
We denote the level-1 trained DEC model in Paris area with the size of 3120 × 2480 [see Fig. 8(b)] as MODEL-P. Select an area in Paris city with the size of 512 × 512 as the training data and obtain the trained DEC model namely MODEL-A, as shown in Fig. 22. Table II presents the detailed parameters of various target images with different scenes, polarizations, and frequency bands. Fig. 22 records different experimental results of these target images, that are: 1) directly tested by MODEL-A; 2) initializing the DEC network by MODEL-A and fine-tuning; and 3) randomly initializing the model and retraining from scratch. When testing directly on a different scenario selected from the same SAR image, as depicted in the first row of Fig. 22, the visualized classification map shows a well enough result. The tested image could tell the vegetation/bareground area with green-like colors, and the scattered man-made objects with red-like colors. It is visually similar with the fine-tuned result except for some objects, such as those in river area, which are reclassified to more proper classes (blue-like colors) after fine-tuning MODEL-A. Since the volume of training data is proportional to the pixel numbers of the selected SAR image, the coverage of the trained image and the various of subband scattering patterns for pixels decide  the generality of the trained model. As long as sufficient patterns the training data would provide, the trained model could be solid to apply to other scenarios. Fig. 23 gives the visualized result to apply MODEL-A to the whole Paris area with the size of 6144 × 7680 which demonstrates its great potential.
Applying MODEL-A to SAR images from different sensors is also discussed in the second and third rows of Fig. 22. Directly testing MODEL-A on Sentinel-1 and TerraSAR-X SAR image may cause problems. Since the extracted subband scattering pattern for each pixel reveals the backscattering behaviors in different azimuth angles and frequency bandwidth, the C-band pretrained model cannot be applied to X-band SAR image even with fine-tuning. For instance, the distinct penetrating abilities for X-band and C-band sensors would result in different scattering behaviors in vegetation areas. Both the feature extracting network and the learned cluster centers will be ineffective. With the same frequency band (C-band) of Sentinel-1, however, the fine-tuning result shows significant improvement. To further demonstrate the validity, we test a larger Sentinel-1 SAR patch with the size of 1024 × 1024 with a more solid pretrained GF-3 Paris model obtained in Section IV-C, namely MODEL-P, as shown in Fig. 24. The result of testing MODEL-P on Sentinel-1 data is much better than the one tested with MODEL-A shown in Fig. 22. Additionally, Fig. 22 demonstrates the comparison for applying MODEL-A to HH and HV polarized data of Sentinel-1 images which indicates the fine-tuned model is applicable to cross-polarized SAR data with same frequency band.

VI. CONCLUSION
In this article, in order to extract the physical scattering properties from single-polarized SAR data, a HDEC-TFA is proposed. By discovering the latent backscattering patterns of targets along range and azimuth directions automatically and hierarchically, a backscattering behavior map is generated for single-polarized SAR images. It provides a new perspective for visualization to help understand the physical properties of objects on the ground despite the absence of polarimetric information in single-polarimetric data. Due to the partly data-driven approach, our proposed HDEC-TFA method may be lack of theory explanation and physical modeling support compared with polarimetric analysis. For a better understanding, a polarimetric analysis approach GD-Wishart algorithm is applied for generating the reference data of scattering mechanisms. We propose an information theory-based evaluation method to compare the HDEC-TFA result with polarimetric analysis in a quantitative way. The experiments are conducted with Gaofen-3 quad-polarimetric data. Some specific targets are analyzed and discussed. The results show that our proposed HDEC-TFA method for single-polarized SAR images can learn the inherent backscattering pattern for each pixel and group them into clusters. There is a strong correlation between HDEC-TFA and polarimetric analysis in some specific classes related to man-made targets. Specifically, the HDEC-TFA method is able to detect some targets with characteristic geometry from single-polarized data better than GD-Wishart method could do for quad-polarized data. This work gives an inspiration on how to integrate the deep learning algorithm and SAR image understanding, with physical scattering properties to be considered even though the polarimetric information is not available. Our future work will focus on the applications of combining the learned physical properties and some interpretation tasks with deep learning, such as target detection, recognition, land use and land cover classification and so on.