A Simultaneous Sparse Learning Algorithm of Structured Approximation with Transformation Analysis Embedded in Bayesian Framework

Sparse approximation is critical to the applications of signal or image processing, and it is conducive to estimate the sparse signals with the joint efforts of transformation analysis. In this study, a simultaneous Bayesian framework was extended for sparse approximation by structured shared support, and a simultaneous sparse learning algorithm of structured approximation (SSL-SA) is proposed with transformation analysis which leads to the feasible solutions more sensibly. Then the improvements of sparse Bayesian learning and iterative reweighting were embedded in the framework to achieve speedy convergence as well as high efficiency with robustness. Furthermore, the iterative optimization and transformation analysis were embedded in the overall learning process to obtain the relative optima for sparse approximation. Finally, compared to conventional reweighting algorithms for simultaneous sparse models with l 1 and l 2 , simulation results present the preponderance of the proposed approach to solve the sparse structure and iterative redundancy in processing sparse signals. The fact indicates that proposed method will be effective to sparsely approximate the various signals and images, which does accurately analyse the target in optimal transformation. It is envisaged that the proposed model could be suitable for a wide range of data in sparse separation and signal denosing.


INTRODUCTION
With the flourishing development of digital communication and streaming media, we are surrounded in the explosion of information or data, and there are more and more urgent requirements for dimensionality reduction to acquire the interesting essentials. The sparse approximation [1,2] is the principle of linear inverse problems for signal estimation in multiple applications, such as compressed sensing, machine learning, and data processing. It enables the reconstruction of signals or images with measurements by the sparse model, which achieves signals denoising and compression in practice. More generally, the generated results may be utilized for data separation and mode classification by sparse approximation. Therefore, the effects of sparse approximation have great impacts on the data representation, and the efficient methods of sparse approximation are imperative for effectiveness and robustness in data.
Various methods have been proposed including sparse subspace clustering, simultaneous sparse learning, and empirical Bayesian models [3][4][5]. To effectively deal with the miscellaneous issues for collections of high-dimension data in the real-world, sparse subspace clustering has an equal verification point that lies in a union of low-dimension subspaces. Among the infinitely possible representations, a sparse representation is commensurate with selecting a few points from the same subspace [6]. Further, sparse subspace clustering exploits the assumption of sparsity. Thus, an essential assumption of this method is that requirements of low-dimension signals can be represented as the linear combination of other signals, similarly. In other words, each signal can be divided into a sparse combination of basic elements. A probabilistic subspace clustering approach is presented with rapid clustering for large signal collections [7]. Instead of using the sparse normalizer for sequential data with the existing methods of sparse subspace clustering, a quadratic normalizer is exploited for sparse representation in numerous application scenarios, and a block-diagonal prior is utilized for the spectral clustering affinity. Hence, the normalizer and the prior affinity are integrated into the mechanism to achieve a statistically significant improvement in accuracy [8]. For pursuing the higher clustering accuracies and lower computational load significantly, many novel algorithms are derived from the projection, sparsity and/or low-rank coefficients in the latent space with low-dimension for simultaneous dimensionality reduction and data clustering [9,10].
It has been verified effectively to improve the performance of subspace clustering [11], and a variety of approaches, such as the empirical Bayesian and the hierarchical Bayesian, are proposed to estimate the common support and the coefficients of signals [12]. However, it is limited to selecting the optimal transformations with the diversities of signals in time, and the assumption of common support is too restrictive and unreliable. For instance, the support of time-varying signals changes slowly over a period of time [13]. Recent work [14,15] gives the accountability and motivation needed to prepare non-convex algorithms for simultaneously structured models for simultaneous sparse approximation. While most centralized algorithms for simultaneous sparse approximation [16] are developed for the case where signals commonly share the same supports, it needs some appropriate methods to improve the adaptation in a decentralized manner.
In this paper, a simultaneous sparse learning algorithm of structured approximation (SSL-SA) is developed based on the optimization of transformation analysis. According to the simultaneous Bayesian framework extended by structured shared support, we can utilize the sparse Bayesian learning, structured approximation and iterative l2 reweighting to update the iterative optimization and transformation analysis. Simulations show that the proposed framework further improves the validity for general and particular sparse signals more rapidly and accurately. The contributions of this paper are summarized as follows: (i) A sparse Bayesian learning method for transformation analysis is proposed, where it does not only approximate the various signals and images sparsely, but also analyse the target in optimal transform domains accurately.
(ii) The structured approximation and its iterative version for transformation domain are derived from the sparse distribution, where the closed solution is obtained by the method of alternating optimization.
(iii) The optimal sparse approximation and transform domain are obtained by stochastic descent and flexible attenuation rate for cost function along with its convergence and computational complexity analysis.
The rest of the paper is structured as follows: Section 2 describes the backgrounds of transformation analysis, empirical Bayesian framework and sparse approximation for signals. In Section 3, we decompose signals with structured approximation based on analytic transformation and sparse distribution. In Section 4, a centralized ℓ2 reweighting algorithm is proposed for the estimation of SSL-SA. Numerical results and discussions are presented in Section 5, where the performance is evaluated based on sparse interference signals with transformation optimization. Conclusions are provided in Section 6.

Transformation analysis
The problem of designing a suitable or general transformation for signals can be tracked back to the transformation of Fourier and Cosine as well as their local versions [17]. Plenty of pioneering work in the field refers to Fourier transform [18], Fractional orders [19,20], and wavelet [21], which is used to signals decomposition and analysis. In choosing an adequate analytic transformation, many transform coefficients are close to zero more or less, and thus possess sparse characteristics obviously [22]. Moreover, the orthogonal multi-scale transform dilates an elementary function. Further findings in the field of transformation design set the stage to more efficient compression for images or codes. Besides, there are other various methods for transformation are used to analyse the spectral features in some aspects, such as Transform domain [23,24], sparse decomposition [25,26] and matching pursuit [27].
Currently, the methods of processing digital signals based on transformation analysis are primarily included Discrete Fourier Transform (DFT), Fractional Fourier Transform (FrFT), Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Short-Time Fourier Transform (STFT). In these cases, all multiform signals can be converted into optimal domains for processing to simplify the forms and characteristics as well as improve the effect of representation and classification. The fact is deserved to be noted that many practical signals can be compressed as collections of some basis elements, which means that they can be expressed with sparse representations properly after transformation by choosing an appropriate basis [28]. And it paves the way to investigate the sparse representation and reconstruction further. Therefore, the orthogonal basis on the determined optimal transformation should be addressed primarily in the sparse approximation of signals.

Empirical Bayesian
Currently, the methods based on likelihood are allowing robust statistical testing for inferring evolutionary rates [29], and they include Bayesian methods with prior distribution and maximum-likelihood methods without the prior. Among them, empirical Bayesian [30] is a valid structure for all systems in estimating the prior parameters, which will support the development of applications for data analysis and signal process.
In the empirical Bayesian method, the density of posterior probability is taken as the target distribution and the prior Gamma distribution is presumed in the case [31]. Within the whole Bayesian framework, the finer details of posterior probability can be derived from the functions of both the likelihood and the prior probability, which is given by where P(ω|Ω) is the given probability rate, and γ is a parameter to control the prior property. It deserves to be considered that the empirical Bayesian method is proved to be distinct from other Bayesian approaches, and to some extent, the prior distribution is greatly determined by the diversity and multi-factors of the given dataset. However, it is an important task whereas generally difficult task for signal processing to immediately acquire the priors in practice [32], instead of Equation (1), the estimated rate is regarded as the expected value of the aforementioned parameters: While the process can be caught in the endless chain of conditional probabilities, the perfect solution of the aforementioned problems is utilized for the estimation of the case, and in further detail, the estimation method takes the place of γ with maximizing the marginal distribution P(ω|γ). Since the estimation might not necessarily stop at the specified steps but is as closed as possible, it leads to the feasible solutions more sensibly. Consequently, the estimation can be achieved by breaking the infinite chain of conditional probabilities [33] in the derivations of Equation (2).

Sparse approximation
Recently, researchers have focused their efforts continuously on the jointly sparse representations of multiple signals, which extend the applications of sparse approximation approaches, such as signal processing, imaging, and distributed compressed sensing [34][35][36]. The problem can be stated as following.
It is supposed that there are various signals described in the same phenomenon, and the impacts of noise are considered. We want to find the optimal sparse approximation of signals based on the same or partly the same set of the specifically elementary functions. That is, the problem aims to find the optimal approximation of the sparse signals on the condition that the number of functions is limited [37]. More extensive attentions are diverted for the strict formalization of sparsity model in the following: where the known matrix Φ is n × N, the k sparse N-vector x is unknown, and the noise vector ε reflects the levels of potential uncertainty. In the circumstances, we want to seek an approximation ŷ to y, whereas the robust condition of strict sparsity for x is taken into account, especially for the realistic applications [38].

PROBLEM STATEMENT
It is exactly based on this consideration that the goal of the simultaneous sparse approximation problem is converted into that recovering the given matrix S and dictionary Φ, where the hypothesis is that each signal shares the same sparsity profile. And the overall process of measurements can be stated as where C is a coefficient matrix. The existing method of row-support function is convenient to evaluate the target number of non-zero rows in the matrix C [39], and the definition of the row-support function is According to the aforementioned row-support function, the problem of simultaneous sparse approximation in Equation (4) can be depicted as And its different forms is derived by According to one of the proposed versions in [40], the relaxed form of the function 0 row  , is expressed as where the parameters are commonly met the criteria p≤1 and q≥1. Hence, the formula in Equation (6) can be derived by the relaxation function: where λ is the balancing parameter between the approximation error and sparsity condition. Instead of just representing objects as superpositions of the traditional Fourier representation, we now have a wide range of additional transformations available to alternate signal representation schemes in various applications. Among them, there includes some derivations such as Wavelet, Wavelet Packets, Cosine Packets, Gabor, Chirplets [41] and etc. Many of these discoveries used multiple response vectors N k   s  as tools to explore how people learn and observe jointly the unknown sparse matrix under different conditions or transformation domains (e.g., spatial, temporal, or other analytic transformations T• ), which is normally explicated when the sensing matrix  is in the set of C, and E denotes the noise matrix for signals.
There are some certain structured subcollections of the elements investigated in previous work, such as time-frequency dictionaries, wavelet packets and cosine packets. Most of the subcollections correspond to certain orthogonal bases in a way. We can derive the basic representation from Equation (10)  where exists the relationship between the basis   An adaptive method of picking the optimal one from among these bases has been proposed in [42], which delivers near-optimal sparsity representations in the time order of nlog(n). There is a valid expression formulated in accordance with where the "entropy" means the sum of elements is a scalar function.
We utilize the sparse distribution to achieve the decomposition for signals, and overcome strict restrictions to the sparsity of signals based on structured approximation, which is more suitable for the approximation of various signals in transformation analysis. In this case, the analytic transformation -1 ( ) can be divided into two parts: a row-sparse matrix and an element-sparse matrix in detail where they remain in the relationship between mutual embeddedness. Thus, the diversified modes of matrix can be constructed by dealing with relaxed constraints of target signals, and then they are not necessarily forced to share the same support. The aforementioned optimization problem is converted to a problem of multioptimization: where the weighting parameters α>0 and β>0 adjust the row-sparse matrix and element-sparse matrix to a certain state of equilibrium. While some of the concerns will be hard to deal with the nonconvexity of the target function, the convex functions l1 norm and l1,2 norm can be well applied in approaching of the multioptimization problem in [3], which is illustrated as following: According to the deprivation of sub-problems decomposition from Equation (14), the iterative optimization can be utilized to close to the approximate solutions through a loop: To make the generated iterations appear in the bound * , and the adjacent iteration of a feasible solution *  Φ is given by the method of gradient descent between the iterations of the t th and t+1 th in Equation (15): Thereinto, nonnegativity of the attenuation parameter λ is satisfied at all times. As for the decomposition estimate of the cost function, we can gain the gradient of functions under some losses: One of the most essentials in aforementioned problems deserves to be solved through learning the optimal approximation -1* ( )  Τ C . Therefore, we can resolve the task by utilizing the learning method of simultaneous sparse Bayesian for structured approximation.

SIMULTANEOUS SPARSE LEARNING WITH STRUCTURED APPROXIMATION (SSL-SA)
In this section, we provide a simultaneous sparse learning approach with structured sparse approximation, and then propose an l2 reweighting algorithm with Bayesian estimation to solve this task.
We consider the Gaussian model with noise variance σ 2 , and its priors are expressed by where the diagonal matrix Γ•k is satisfied with restrictions diag(Γ•k)=γ•k, and an unknown parameter vector γ•k is used for governing the k th signal. Combing the likelihood and priors, then the j th iteration of the posterior density of -1 ( ) where M and ∑ are mean and covariance of the density respectively.
From Bayesian perspectives, we apply the maximum a posterior (MAP) to estimate -1 ( ) It is noteworthy that the element-sparse part as scalar vectors that are common and unique to signals respectively, which give Therefore, the convex optimization problem can be cast as With the relationship in [4], there exists Then the cost function can be derived from Meanwhile, we obtain the following bound based on Equation (22) and Equation (24), which gives by And the equality in Equation (26) holds if the relation is met Therefore, we can gain the solutions, which are expressed as Above all, according to the updating parameters acquired, we can obtain the close form solutions, and utilize the effectiveness under the different transformation domains to update the optimal transformation for corresponding signals, which will further improve the validity for approximation.  (29) End Do update the gradient of cost function: Do update the sensing matrix Φt: Update the transformation:

RESULTS AND DISCUSSIONS
To implement design verification of the proposed approach SSL-SA, the effectiveness performance of simultaneously structured approximation is evaluated based on sparse signals with transformation optimization. Our proposed method SSL-SA is compared with the similarly existing work for sparse approximation where the original Structured Sparse Models for l1 or l2 reweighting (SSM-1, SSM-2) [3] and the Bayesian Compressive Sensing (BCS) are included [5]. The number of initial random measurements N=40. For SSM-1 and SSM-2, we randomly selected K=20 nonzero rows of sparse signals for sparse representation where all the nonzero entries are independent with each other, then the nonzero rows of each column were randomly forced to be zeros. Considering the diversities of sparse signals, we compared the performance of general sparse signals and particular sparse signals under the same conditions, respectively. The same sensing matrix was utilized for all signals, and the noise vectors were generated randomly with Gaussian noise when the signal to noise ratio (SNR) is of 10 dB. For iterative parameters, we set the initial attenuation rate α0=1 and iterations tmax =20.
In detail, the general sparse signals are limited in the common signals with lower sparsity whereas the particulars are mainly about the special function, such as radar signals and electromagnetic interference signals with lower or higher sparsity. It is worth noting that the sparse approximation is more suitable for data separation with higher sparsity, and the sparse interference signals are tested in the practical applications for approximation.

Convergence performance
In the experiments for particular sparse signals, we used a library of common interference sources for Class A impulse interference, which is the jamming type simulated extensively [43]. And the environmental noise was added on the signals with the white Gaussian noise. The basic convergence of performance is evaluated during iterations based on a relative error, which is defined by || Ĉt+1-Ct||F/|| Ct||F. Fig. 1 describes the tendency to converge to a stable state in the inner iterative loop of the all mentioned algorithms. It is found that the proposed algorithm SSL-SA converges more rapidly than the other mentioned methods, and this phenomenon has been observed in different settings of analogous signals although we only show one example here. The distinctive convergence of our proposed algorithm SSL-SA compared with SSM-2 is resulted from the combination of iterative optimization and transformation analysis, which determines the optimal transformation for the variable signals.
Meanwhile, the distinction between SSL-SA and BCS is caused by the different model and iterative optimization in the structured elements. In addition to all the foregoing, the convergent difference of the two SSMs is a result of the diversely component reweighting mechanisms.
More specifically, Table 1 illustrates the data statistics of different algorithms about iterative errors, and SSL-SA only needs the 9 th iteration whereas the SSMs gets 10 th and BCS is 12 th when the differential error of iterations is reached the standard about 0.5. Compared with the final convergence, SSM-1, BCS and SSL-SA are in close proximity to10%, which is obtained the improvement of 18% in the level of SSM-2 on account of the complicated forms of iteration. Furthermore, the median fairly reflects the effectiveness of iterative convergence and the proposed SSL-SA outperforms all the compared algorithms for its modified iteration and transformed optimization.

Time-effectiveness performance
To reflect the timeliness performance of algorithms, the running time of the proposed method is required for the time-sensitive evaluation of tests. For the ten groups of interference signals with the change in frequency, 100 Monte-Carlo simulations were executed when the signal-to-noise ratios within the scope of -5 dB to 15 dB, and the sampling frequency fs=1024 MHz. The channel is assumed to be a Gaussian white noise channel and the initial SNR is 5 dB. All experiments are performed in the MATLAB R2013b environment and the simulation hardware platform is an Intel(R) Core(TM) i7 CPU (3.40 GHz), 4G memory PC. The mean value of each iterative time for the same group of interference datasets is compared in Fig. 2, and the statistic details of whole groups for dealing with the various interference signals are shown in Fig. 3. All the SSM-1, SSM-2 and our proposed SSL-SA algorithms take less running time by 14.6% than BCS for the improvement of structured approximation. Furthermore, the two SSMs and SSL-SA consume the approximate time, which means their similar time complexities. In comparison to the two SSMs methods, our SSL-SA improves the stability of iterative optimization about 5% for the updated parameters with Bayesian estimation, respectively.

Recovery performance
In this experimental verification, the number of sparse signals was fixed as K=40, and the number of random measurements ranged from 40 to 120. The sensing matrix was randomly corrupted by the additive Gaussian noise when the initial SNR is of 10 dB. And the average reconstruction error in this paper is generally regarded as ||Ĉ-C||F/||C||F, in which Ĉ is the estimation of C. Fig. 4 and Fig.  5 compared the recovery performance of the mentioned algorithms, which has exhibited the superiority of our proposed solutions.
It is observed in Fig. 4 that the reconstruction errors of all aforementioned algorithms tend to have a decrease together with the increasing number of measurements. In detail, with the convergence closed to an ideal state, the proposed method presents its superiorities of rapidness compared with other three algorithms, and it can obtain the improvement about 10%~20% in effective performance. Meanwhile, an error bar of the concrete performance is illustrated in Fig. 5, the fluctuations of our SSM-SA decrease gradually and finally make convergence to the optimal sparse approximation. When the number of measurements remains the degree below 60, the BCS has poor performance for its neglecting estimation of the structured signals. However, for a large number of measurements above 80, the proposed SSL-SA and BCS algorithms perform better than the two SSMs due to the error prorogation of the reconstructing process initially. Considering the fluctuations of the whole process, SSA-SL presents steadier than BCS for the more feasible iteration and optimization, which demonstrates the robustness of SSL-SA with the change of measurements.

Practical approximating performance
Signal Approximation [20] is used to verify the performance of the proposed method, and in this case, each group of samples should be guaranteed the independence as far as possible. For the verified simulations more generally, the irregular dataset was generated by the different intensity of noise interference mixed into the interference signals in section 5.2, which results in the various dataset more complexly. Moreover, the power of signals was set in the scope of 1/10 Pmax~2 Pmax with the interval of 1/10 Pmax. 60 Monte-Carlo simulations were executed by the obtained 100 samples with the change of interference intensity when the SNR is 5 dB, and the sampling frequency fs=512 MHz. The channel was utilized to simplify as an additive Gaussian noise channel. The root mean square error (RMSE) and the average RMSE were used to evaluate the approximation performance, and the simulation results are demonstrated in Fig. 6~Fig.8. It is observed that the optimal approximation under the diversity of interference power remains an acceptable extent where the RMSEs are approximated below 2.5. And compared with the approximation of higher power samples, the samples with low power are more suitable for sparse approximation due to the rapid adjustments of the iterated parameters on the impact of smaller datum. As a result, the proposed SSL-SA can be extended to effectively deal with the unknown signals with low power. And the detailed relations between RMSE and power are depicted in Fig. 9. With the increase of the signal intensities, the average RMSE increases gradually with a linear relationship approximately. Furthermore, the effectiveness of the optimal approximation can be judged by the linear fitness between RMS and power, which includes the maximum residual mode and maximum deviation. In our simulations, the fitting equation can be obtained of y=0.022x+1.434, and the maximum residual mode is 0.126. Then the significant tests of the linear hypothesis can be conducted by t-test, and the rejection domain is worth consideration. In the end, we can refer the approximated fitness to the analysis of unknown signals extensively.

CONCLUSIONS
With the combination of compressed sensing and machine learning, data separation and mode classification can be achieved in the measurements by sparse approximation. While simultaneous sparse learning is successful for sparse approximation problems, the strict assumption turns into essential restrictions for the performance of various and unknown signals processing. The proposed method develops an appropriate simultaneous Bayesian framework in the learning optimization process, structured shared supports and transformation analysis are utilized for the sparse approximate of general and particular signals. The improvement for convergence and efficiency are obtained greatly, which will broaden its applications in signals denoising and images compression. In further study, the derivations of the proposed model will be applied for interference separation, parameters estimation and representation classification.