An Iterative Threshold Algorithm of Log-Sum Regularization for Sparse Problem

The log-sum function as a penalty has always been drawing widespread attention in the field of sparse problems. However, it brings a non-convex, non-smooth and non-Lipschitz optimization problem that is difficult to tackle. To overcome the problem, an iterative threshold algorithm for the sparse optimization problems with log-sum function is proposed in this paper. For brevity, the sparse optimization problems with log-sum function are named log-sum regularization. Firstly, by introducing an intermediate function to construct another new function, a property theorem about solution for log-sum regularization is established. Secondly, based on the above theorem, the optimal setting rules of the compromising parameters are elaborated, and an iterative log-sum threshold algorithm is proposed. Thirdly, under the situation that the compromising parameters of log-sum regularization are relatively small, it can be proven that the proposed algorithm converges to a local minimizer of log-sum regularization. Finally, a series of simulations are implemented to examine performance of the proposed algorithm, and the results exhibit that the proposed algorithm outperforms the state-of-the-art algorithms.


I. INTRODUCTION
The sparse problems encountered in many domains of scientific research and engineering practice have attracted extensive attention in recent years, such as MmWave Massive MIMO channel estimation [1]- [3], machine learning [4], [5], jammer detection [6], [7], image processing [8]- [12].The canonical form of this problems can be expressed as where Y ∈ R M is the acquired measurements, A ∈ R M ×N is the measurement matrix with M << N and E represents the observation noise.The object of this problem is to recover sparse vector X ∈ R N from Y.The problem can also be modeled as the so called L 0 regularization issue min where ∥X∥ 0 , formally called L 0 norm, denotes the number of nonzero components of X, and λ is a positive regularization Manuscript received May 28, 2022.This work was supported in part by the National Science Foundation of China under Grant 62102423 and 62101559, in part by the National University of Defense Technology Grant No. ZK21-37 (Corresponding author: L. Jia.) X. Zhou, X. Liu, G. Zhang, X. Wang, and Z. Zhao are with the College of Information and Communication, National University of Defense Technology, Wuhan, 430010, China.(e-mail: zhouxin yy1987@163.com;lxw5054@163.com;gongetowy@yeah.net;wangx-u9191@gmail.com;zhaozhiyuan1986@sina.com).
L. Jia is with the School of Space Information, Space Engineering University, Beijing 101400, China.(e-mail: jiallts@163.com).number.Solving the L 0 regularization, however, is an NPhard problem [13].In order to overcome such difficulty, L 1 regularization was proposed as an alternative [14]- [17] min where ∥X∥ 1 = ∑ N i=1 |x i | denotes the L 1 norm.Because the L 1 regularization is a convex optimization problem, which can be solved very efficiently, it becomes popular and has been extensively employed for the solution of sparsity problems.Unfortunately, L 1 regularization cannot achieve further sparsity, particularly for compressed sensing [18]- [24], it often causes an over-penalized situation.Hence, some additional improvements are expected.To bridge the gap between the L 0 and L 1 regularization, academics have studied two representative strategies.
One alternative is the L q (0 < q < 1) regularization [18], [19], [22]- [24] min where ∥X∥ q , defined by ∥X∥ q = ( ∑ N i=1 |x i | q ) 1/q , denotes the L q quasi-norm.The L q regularization is a non-convex, non-smooth, and non-Lipschitz optimization problem.Which q should be selected to obtain the best result in various applications?Previous studies in [24]- [27] have answered partially the above questions.Particularly, in [18], [19], Xu and Zeng proposed an iterative thresholding algorithm for L 1/2 regularization, and further demonstrated the high efficiency and convergence of the algorithm by formula derivation.
The other is log-sum regularization [4], [11], [12], [28]- [33] min where x i denotes the ith entry of X, and ε is a positive parameter to ensure that the function is well-defined.Particularly, it was shown in [31] that when ε = 0, the log-sum function is mainly the same as the L 0 norm.In [28] and [29], for noiseless case, where ∥Y − AX∥ 2 2 = 0, Shen and Fang respectively demonstrated the existence of the global minimizer of logsum function with the term |x| j , under the case that j is equal to 1 and 2. In [33], the log-sum function was combined with majorization-minimization algorithm to tackle the rank minimization problem.It is shown in [4] that the log-sumexp neural network is a smooth universal approximator of continuous functions over convex and compact sets.In [11], based on fully exploring the intrinsic structure of a natural hyperspectral image, the log-sum function was applied to the blind hyperspectral unmixing.In [12], similar to the classical gradient algorithm, a log-sum generalized iterated shrinkage threshold algorithm for solving the magnetic resonance image recovery problem is proposed.These recent studies make log-sum regularization prominent.Such a further study is proceeded for high-dimensional compressed sensing problem in this paper, our goal is to present a new iterative thresholdtype algorithm for log-sum regularization, matching the widely known iterative hard threshold algorithm (the hard algorithm in brief) [34]- [37] for L 0 regularization, the iterative soft threshold algorithm (the soft algorithm in brief) [38], [39] for L 1 regularization, and the iterative half threshold algorithm (the half algorithm in brief) [18], [19] for L 1/2 regularization.This is encouraged not only by the fact that the setting of the parameters is simple and convenient, but also the performance improvement in terms of iterations and recovery accuracy.It is promosing that such a fast iterative threshold algorithm will promote the application of log-sum regularization for sparsity problems.
The main contributions of the present study are: • Through calculating the derivative analytic expression of log-sum function, a property theorem on solution of log-sum regularization and its threshold expression are derived.• Based on the threshold expression, the optimal setting rules of the compromising parameters λ and ε are expounded, and an iterative log-sum threshold algorithm for the fast solution of log-sum regularization is obtained.• In the case that the compromising parameters are relatively small, it is verified that the iterative log-sum threshold algorithm (the log-sum algorithm in brief) can converge to a local minimizer of log-sum regularization.• A series of experiments are conducted to assess performance of the log-sum algorithm.The results show that the proposed algorithm is superior to the hard algorithm, the soft algorithm, and the half algorithm.The rest of this paper is organized as follows.In Section II, a property theorem about solution of log-sum regularization and its threshold expression are derived.Section III presents the optimal setting rules of the compromising parameters and the log-sum algorithm.In Section IV, the convergence of the logsum algorithm is verified.In Section V, the simulations show convergence, robustness and effectiveness of the proposed algorithm.We conclude this paper in Section VI.

A. Notion and Notation
As shown in [18], a threshold function h(x) can be expressed as where x h > 0 is a positive threshold value, and f (x) is a defining function.When the scalar x and h(x) were respectively replaced with the vector X and H(X) , we can get where X is defined by

and H
is an affine threshold operator from R N to R N .

B. Affine Operator
The affine operator is a critical process of our algorithm for (5), it is necessary to describe its analytic expression.
With any specified scalar ε and vector θ T ∈ R N , we define the function similar to (7), the affine operator Proof: By taking the derivative of (8) with respect to x i , and setting it as zero, we get for any solution xi of (11), xi θ i > 0 is satisfied.Moreover, with 0 < ε < √ λ/2, if xi > 0, the minimum value of ) is equal to √ 2λ − ε, and if xi < 0, the maximum value of 2(|xi|+ε) ≤ 0, and when x i ∈ (0, +∞), The following, we consider the situation ) is an odd function with respect to x i , the two roots of (11) can be represented by ) . Further, we can test that, the left-hand side of ( 11) is respectively greater, less, and greater than zero, with , and ) is an odd function with respect to x i , by a similar analysis as in Scenario 1, we can check that, ) is a local minimum point of (8).In summary, the local minimum point of (8) can be indicated as Note that, to gain the defining function f λ,ε (θ i ), we must consider not only the unique local minimizer x i = fλ,ε (θ i ), but also the unique non-differentiable point x i = 0, thus f λ,ε (θ i ) = 0 or = fλ,ε (θ i ), the proof of Theorem 1 has been completed, the explicit expression of f λ,ε (θ i ) will be given in Lemma 1.Based on Theorem 1, a novel property theorem will be given.

C. Property Theorem
The log-sum regularization ( 5) is also a non-convex and non-smooth optimization problem.Let us define (13) it is difficult to directly obtain the minimizer of (13).Inspired by the majorization-minimization algorithm [40], a novel function C λ/α,ε,Z (X), that coincides with F λ,ε (X) at X = Z but otherwise is greater than F λ,ε (X), can be constructed as For meeting the above-mentioned conditions, we only need that is, the parameter α must be equal to or greater than the maximum eigenvalue of A T A. Here, let α ≥ max ( eigA T A ) + 1.By introducing the intermediate function (15), the minimizer of ( 14) can be calculated.Firstly, we prove the following lemma.
Lemma 1: ) and vector Z, then where B α (Z) and x # h are respectively defined by Proof: The constructed function C λ/α,ε,Z (X) can be transformed into where the terms α ∥Z∥ ) where x i is independent for any i ∈ [1, N ], further, the solution of ( 19) is the same as the solution of Based on Theorem 1 and by respectively replacing λ and θ i with λ/α and [B α (Z)] i in (10), a unique local minimizer of ( 20) can be represented as , to obtain the global minimizer of L Z (x i ), we consider both the local minimizer x i = x l i and the non-differentiable point x i = 0.The global minimizer of ( 20) is given by x Similar to (11), and and where chτ = e τ +e −τ 2 is the hyperbolic cosine function.Substituting ( 21), ( 25) and ( 26) into (24), then λ α By further simplifying (27), it holds λ 2α Let us define In Appendix A, we demonstrate that, with any specified λ > 0, α > 0 and β ∈ (0, 1), the function F λ/α (τ, β) is monotonously increasing with respect to τ and lim τ →0 F λ/α (τ, β) < 0. The proof is completed here.
For further disclosing the connection between ( 5) and ( 14), by utilizing Lemma 1, we continue to testify the next theorem: Theorem 2: T is a solution of (5) and α is greater than the maximum eigenvalue of A T A, then Proof: By replacing Z with X F , we have Owing to α > max ) T is a solution of (5), for any X ∈ R N , we deduce which indicates that X F is the global minimizer of (31).In addition, Lemma 1 gives the explicit expression of the solution of ( 31), hence, we can infer (30).Lemma 1 and Theorem 2 manifest that it is only a sufficient and unnecessary condition for X F to be the global minimizer of (31) that X F is a solution of (5), and √ 2λ/α is the threshold value of the log-sum regularization.

A. Threshold Expression
According to Theorem 1 and Theorem 2, we define as a threshold function of log-sum regularization.By means of the threshold expression of log-sum regularization is indicated as

B. Optimal Regularization Parameter
In the regularization issue, the setting of regularization parameters directly determines the quantity of its solutions.Nevertheless, how to select appropriate parameters is always challenging, despite there are some useful heuristics [41], [42].In most and general cases, the cross-validation method is largely applied.Fortunately, on special occasion that the sparsity of a regularization problem is given, the regularization parameter can be more reasonably set.
Let us take (5) as an example to analyze the rule of the selection of regularization parameters.Assume the solution of ( 5) is k-sparsity, and ) T is a log-sum solution of (5), we mean supp (X F ) = k, where supp (X) denotes the quantity of nonzero numbers of the vector X, and without loss of generality, we further suppose Based on ( 34) and ( 35), we have where i ∈ {1, 2, • • • , k}, and where i ∈ {k + 1, k + 2, • • • , N }, and β = ε / √ λ/2α and τ is the unique positive solution of (17) with β = β.
The above inequation (44) indicates the value range of the optimal regularization parameter λ with the specified β.By taking the lower limit, we get From (45), it can be found that the larger the λ, the larger the threshold value x # h , and the sparser the solution of the threshold algorithm.By replacing X F with its approximation X n in (45), we can take which explicitly shows the parameter-setting strategy of an iteration algorithm.

C. Log-sum Threshold Algorithm
Based on the threshold representations ( 34) and ( 35), an iterative algorithm for log-sum regularization can be directly expressed as where H λ/α,ε (•) is the log-sum threshold operator.For simplicity, we name the above method the log-sum algorithm.
According to different parameter-setting strategies, (47) conducts separate schemes of the log-sum algorithm.For example, the following can be implemented.
Scheme 1: α n = α ∆ , λ n and β n are chosen by crossvalidation, ) 2 ; Scheme 3: In Scheme 1, α n , β n and λ n are actually set by crossvalidation during every iterative process.In Scheme 2, α n and β n are fixed, λ n is updated, ε n varies with λ n .Scheme 3 is a variant of Scheme 2, the only difference is that λ n keeps monotonously decreasing in Scheme 3.With those schemes, the log-sum algorithm will be tested in Section V.
Genarally, Scheme 1 is suitable for the situation that the sparsity is completely unknown, and Scheme 2 and Scheme 3 can be used when some prior knowledge on k-sparsity, i.e., the value range of the sparsity, is given.By setting k as an upper limit value of the sparsity, the log-sum algorithm represents good robustness in Section V-B.

IV. CONVERGENCE ANALYSIS
In this section, we prove the convergence of the log-sum algorithm combined with Scheme 1.
Theorem 3: Assume that α ≥ max ( eigA T A ) + 1 and {X n } is the generated sequence of the log-sum algorithm with Scheme 1, then 1. F λ,ε (X n ) is monotonously decreasing and converges to F λ,ε (X * ), where X * is a limit point of the sequence {X n }; ) .
If σ > 0 and 0 ) ≥ 0 which implies that, X * is a local minimizer of the log-sum algorithm.
2. Since α ≥ max ( eigA T A ) + 1, we know that all the eigenvalues of the matrix αI − A T A are equal to or greater than 1, and it implies Combined with (49), therefore, it holds which implies that lim This proves 2 of Theorem 3. In Appendix B, 3 of Theorem 3 is proven.
This completes the proof of Theorem 3, which indicates that log-sum algorithm with Scheme 1 is sure to converge when α ≥ max ( eigA T A ) + 1 is satisfied.

V. SIMULATIONS AND APPLICATIONS
In this section, we conduct a set of simulations and applications to substantiate the high-performance of the log-sum algorithm.
In the simulations and applications, the log-sum algorithm is applied to a classic compressed sensing problem, namely, sparse signal recovery, which can be indicated as (1).Our aim is to evaluate the convergence, robustness, and effectiveness of the proposed algorithm via these simulations.

A. Convergence Justification
We initiate an experiment to verify the convergence.In the experiment, the signal recovery problem is solved through observation Y = AX+E, where the measurement matrix A is of dimension M ×N = 256×512 with Gaussian N (0, 1) i.i.d.entries, E is the observation noise with dimension M = 256 and Gaussian N (0, 0.0025) i.i.d.entries, and X with dimension N = 512 and sparsity k = 100 is the sparse signal we need to recover, all nonzero entries of X obey Gaussian N (1, 1) i.i.d..We employ the log-sum algorithm to tackle the problem with λ = 0.001, β = 0.7, α = max The experimental results are reported in Fig. 1.Fig. 1(a) reveals that the iterative sequence of the objective function F λ,ε (X n ) is monotonously decreasing and converges to F λ,ε (X * ).Fig. 1(b) shows that the mean square error (MSE) ∥X n −X * ∥ 2 is also monotonously decreasing and converges to zero.This experiment clearly confirms the convergence properties of the log-sum algorithm with Scheme 1 that we have proved in Theorem 3.

B. Robustness on Sparsity Overestimation
In most cases, the sparsity of the original signal X is unknown, which is a realistic challenge.Seeking to this situation, the exact sparsity value is generally replaced with a rough estimation.In the following, we carry out experiments to show performance of the proposed algorithm with this strategy.Given the same signal recovery problem as Section V-A, rather than the exact sparsity value k = 100, we utilize variable sparsity estimation from an underestimated value 50 to an overestimated value 200.
Fig. 2 exhibits experimental results with different measurements M , where the horizontal axis is the sparsity estimation and the vertical axis is the recovery precision ∥X − X n ∥ 2 /∥X∥ 2 .From Fig. 2, it can be seen that the underestimations of sparsity usually gain an unsatisfactory performance, fortunately, the overestimations of sparsity can achieve ideal outcome close to that the exact sparsity value attains, and they cover a broad scope on the horizontal axis.Particularly, the bigger the measuring value M , the broader the stable scope.
These experiments expose that the log-sum algorithm with overestimation of sparsity has certain robustness, which is proportional to M .Further, a rougher estimation of the sparsity value needs to be compensated with bigger M for perfect recovery.

C. Comparisons with other algorithm
Given the same problem as Section V-A, we conduct two experiments (without noise and with noise) to compare the performance of the hard algorithm [34], [35], the soft algorithm [38], [43], the half algorithm [18], [19], and the proposed log-sum algorithm.All those algorithms threshold operators are listed as follows: 1. the hard threshold operator ) −3 / 2 ) .

the log-sum threshold operator
2 − e −2τ − 2τ + 2 log β = 0, and ε = β √ λ/2α.Instead of using the certain measuring value M = 256, we re-simulate the above-mentioned algorithms with several variable numbers on measurements M .Moreover, the logsum algorithm is applied with Scheme 2 and Scheme 3 (the simulation outcomes are almost the same), and the other three algorithms are calculated with the parameter µ = 1/α = 1 /( max ) .In fact, the value of the parameter µ in this paper is different from that in the half algorithm [18], [19], where µ is equal to 1 / ∥A∥ 2 2 .In [18], [19], the introduced intermediary function ∥X − Z∥ .From (47) of [18], it can be seen that the parameter µ plays an important role of iterative step size, especially, the smaller this parameter µ, the more iterations.Hence, for a better comparison, we replace the tiny µ = 1 , which is identical to α of the logsum algorithm.In each case, the MSE between the recovered signal and the original signal is derived.
1. Signal Without Noise: We attempt to recover the signal X from the observation Y = AX without noise.From Fig. 3, we can see that except the soft algorithm, all the other three algorithms can successfully recover the signal when M = 270, especially, the log-sum algorithm and the half algorithm simultaneously attain higher recovery precision than the hard algorithm.Ulteriorly, with the measurements reduced to M = 240, the hard algorithm failed, delightfully, the log-sum algorithm still acheve an ideal result.Moreover, it has higher accuracy than the half algorithm with the same iterations, and requires fewer iterations than the half algorithm with the same precision.This experiment reveals that the logsum algorithm is superior to the other three algorithms.
2. Signal With Noise: We recover the signal X from the observation Y = AX+E, each entry of noise E obeys Gaussian N (0, 0.0025) i.i.d.. Owing to the unsatisfactory performance of the soft algorithm, we pay attention to the other three algorithms, i.e., the hard algorithm, the half algorithm and the log-sum algorithm.Moreover, in order to display the effect of noise, the oracle MSE is invoked as a reference object.From Fig. 4, it is can be seen that with the iterations increasing to 500, both the log-sum algorithm and half algorithm can get the same standard as the oracle MSE when M = 270.With the measurements decreased to M = 240, the performance of the half algorithm changes dramatically, but the log-sum algorithm remain satisfied with high probability.This also shows that among the aforementioned algorithms, the log-sum algorithm only need the least measurements for perfect recovery in the same noise environment.

VI. CONCLUSION
We have proceeded a study of a specific regularization framework, i.e., log-sum regularization, for better solution to the sparsity problem.The main contribution of this paper is to build an accurate threshold representation theory.And based on the theory, a fast and efficient iterative threshold algorithm of log-sum regularization is exploited.Moreover, It is proven that the proposed algorithm with relatively small regularization parameters λ and ε can converge to a local minimizer.
It has been shown in recent studies [38]- [42] that logsum regularization possesses mighty capability for sparsity problem.In order to verify our assertions, we carry out a set of simulations, where the proposed algorithm puts up favourable convergence and robustness, and outperforms the hard algorithm, the soft algorithm and the half algorithm in terms of iterations and recovery precision.In summary, the proposed iterative threshold algorithm offers a rapid and effective methodology for log-sum regularization, it is simple and convenient to use, and suitable for large-scale sparse problems.This will likely lay a theoretical foundation for the further development of log-sum regularization.

APPENDIX B THE SOLUTION OF LOG-SUM ALGORITHM CONVERGES TO A LOCAL MINIMIZER
In this appendix, the solution of log-sum algorithm converges to a local minimizer that is certified.We define a constant θ 0 ∈ (0, 1) and θ 1 = , where n is the number of nonzero components of X * .Let θ = min(θ 0 , θ 1 ), we will show that for any component l i of L satisfying |l i | < θζ, it holds F λ,ε (X * + L) − F λ,ε (X * ) ≥ 0.
In fact, we have (55) Based on (11) and (31), the following holds for any i ∈ I c , and for any i ∈ I. Therefore, we can get    can be deduced, it implies that X * is a local minimizer of the log-sum algorithm.This proves 3 of Theorem 3.

Fig. 1 .
Fig. 1.Experiment for convergence of the proposed algorithm.(a) The trend of the objective function; (b) The trend of iteration error.

Fig. 4 .
Fig. 4. Recovery precision of all the four algorithms with different measurements in noise case.(a) M = 270; (b) M = 240.