FRC-Net: A Simple Yet Effective Architecture for Low-Light Image Enhancement

Low-light image enhancement (LLIE) aims at refining illumination and restoring the details of low-light images. However, current deep LLIE models still face two crucial issues causing blurred textures and inaccurate illumination: 1) low-quality detail-recovery results due to information loss; 2) complex and even redundant model structure. In this paper, we therefore propose a simple yet effective deep LLIE architecture, termed Full-Resolution Context Network (FRC-Net). To avoid the visual information loss caused by feature scaling, we present a novel full-resolution representation strategy to replace all feature scaling operations, which can prevent the information degradation by making the intermediary features keep the original resolution. The structure of FRC-Net is very simple, which only contains 12 cascaded layers: 7 convolutional layers and 5 newly-designed context attention (CA) modules. The plug-and-play CA module is designed to overcome the limited receptive field caused by shallow structures by learning global context as well as retaining local details. Extensive experiments show that our model obtains better detail-recovery quality over current SOTA methods, with relatively fewer parameters and faster inference speed.


I. INTRODUCTION
C ONSUMER electronics include digital devices that are used for entertainment, communication, education and security [1], [2], [3], for instance digital cameras, mobile phones, personal computers and televisions.Many of these devices have cameras that can take pictures and videos in different environments.However, when there is not enough light, such as at night or indoors, the pictures may look unnatural.That is, these images obtained in weak-illumination conditions usually suffer from various degradations, such as low contrast, low visibility, noise and color shift.It is noteworthy that these Zhao Zhang, Huan Zheng, and Richang Hong are with the School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China (e-mail: cszzhang@gmail.com;huanzheng1998@gmail.com; hongrc.hfut@gmail.com).
Jicong Fan is with School of Data Science, The Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China (e-mail: fanjicong@ cuhk.edu.cn).
Yi Yang is with the School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China (e-mail: yi.yang@uts.edu.au).
Shuicheng Yan is with Sea AI Lab, Singapore (e-mail: yansc@sea.com).Digital Object Identifier 10.1109/TCE.2023.3280467degradations will not only result in poor visual perception, but also degrade subsequent high-level tasks.However, how to refine the illumination and obtain accurate detail-recovery of low-light images is still a challenging task.Recently, many LLIE methods have been proposed, including traditional and deep learning-based methods [4], [5].
Traditional LLIE methods in the early years mainly build models based on various image priors to reverse the degradation process [6].Nevertheless, they usually focus on enhancing the contrast rather than directly refining the illumination and recovering the details.As a result, the enhanced images usually contain undesirable illumination and unclear textures.With the great success of deep learning on diverse vision tasks [7], [8], [9], [10], [11], [12], [13], data-driven deep LLIE methods have been attracting considerable attention [14], [15], [16], [17].Deep LLIE methods learn a mapping from a low-light image to a normal-light image by designing a variety of modules and loss functions.Attributed to the strong learning ability of deep neural networks, deep LLIE methods are capable of producing better detail-recovery results.However, current deep LLIE methods still suffer from the following two major issues: (1) Low-quality detail-recovery due to information loss.As shown in Figure 1, the enhanced images of current deep methods still contain blurred textures, inaccurate illumination and color shift.We ask: what makes the deep LLIE model produce undesired and inaccurate detail-recovery results?It is difficult to answer this question immediately.As such, we first need to figure out the relationship of current deep LLIE methods.Note that almost all existing deep models are based on U-Net structure [18], which contains multiple feature scaling operations.However, feature scaling would inevitably lose certain informative visual primitives [19].Therefore, we think: the information loss caused by feature scaling makes the enhanced images lose important details, and contain undesired textures and colors.
(2) Complex deep model architecture.Current deep LLIE models usually use complex and even redundant architecture for better detail-recovery.For example, Retinex-based methods include at least two sub-networks; end-to-end methods are based on various convolution-based modules.Complicated loss functions are also designed for better enhancement performance.However, we believe:the original goal and ideal solution of deep LLIE should be a simple, effective and relatively lightweight model, without complex structures.[20], Kind++ [21], EnGAN [22], Zero-DCE [23], Kind [24], Zero-DCE++ [25] and our FRC-Net.Clearly, our FRC-Net obtains the best performance, which delivers obviously more accurate illumination and more consistent color among all compared methods.In this paper, we explore novel strategies to overcome the aforementioned issues of information loss by feature scaling and complex structure.Finally, we propose a new, simple yet effective architecture for deep LLIE.The main contributions of this paper are summarized as follows: •

II. RELATED WORK
We will briefly review some related LLIE works to our method.More details can be referred to [4], [5].

A. Traditional LLIE Methods
For the traditional methods, we mainly introduce two categories: HE-based and retinex-based methods.
HE-Based Methods: The main idea of HE-based methods is to enhance the contrast of images by using different known priors, which aim to change the dynamic range of the low-light image to obtain the normal-light image [26], [27].However, simply improving the contrast of images will usually result in under-enhancement or over-enhancement.
Retinex-Based Methods: Inspired by the retinex theory [28] based on human visual system, retinex-based LLIE methods [29], [30], [31], [32] decouple an image into the product of two elements: where S is an image, denotes the pixel-wise product, R and I denote the corresponding reflectance map and illumination map respectively.By performing subsequent operations on R and I, the illumination-enhanced image can be reconstructed.Retinex-based LLIE methods directly estimate the illumination map, which is hand-crafted and needs careful parameters tuning.Besides, the unknown noise and artifacts produced by these methods also make the enhanced images of low quality.

B. Data-Driven Deep LLIE Methods
Deep LLIE methods have produced impressive performance in low-level vision tasks because of the strong representation ability of deep neural network [33], [34], [35], [36], [37], which can be roughly categorized into supervised, semisupervised and unsupervised types, based on whether paired data are used.
Supervised Deep Methods: The setting of supervised deep LLIE methods utilizes paired low-light and normal-light images for training.These methods can be further divided into end-to-end and retinex-based methods.End-to-end methods directly deal with the low-light image and output the illumination-enhanced image [38], [39], [40].In contrast, Retinex-based deep LLIE methods build a decompose network with relative loss functions to decompose an image into a reflectance map and an illumination map [20], [21], [24], [41].Note that almost all current supervised deep LLIE models are complex and based on U-Net, but the employed feature scaling operations in U-Net make informative visual primitives lost and hence degrading the detail-recovery quality of low-light images.
Unsupervised/Semi-Supervised Deep Methods: In reality, low-light and normal-light images are often unpaired.As such, a few un-/semi-supervised LLIE methods are proposed, which respectively use all unpaired or partial paired data for training.For example, Yang et al. [42] proposed the first semi-supervised LLIE method, which learns a linear band representation of an illumination enhanced image through a deep recursive band network.Jiang et al. [22] proposed a generative adversarial network (GAN) based LLIE method, which only needs unpaired data.More recently, some zero-shot methods that only use low-light images are derived [23], [25], [43], [44], [45].By building a neural network with rich image priorbased loss functions, these methods can refine the illumination of low-light images.But the performance is still worse due to the lack of paired data, compared with supervised methods.

III. PROPOSED METHOD
We introduce the framework of FRC-Net from two aspects, i.e., how to design simple architecture and complete effective enhancement?The loss functions will be then introduced.

A. Network Architecture
We aim to build a simple yet effective deep model for LLIE.Specifically, we deliver two novel strategies to achieve the goal of simplicity and effectiveness, i.e., full-resolution representation (FRR) strategy and CA module.
1) Full-Resolution Representation (FRR) Strategy: Feature scaling is an important and common operation widely used in deep neural networks for diverse computer vision tasks.While feature scaling inevitably loses some detailed visual information.For LLIE, the goal is to reconstruct the enhanced normal-light image with rich textures and accurate illumination, which are closely related to the lost visual detail information in the process of feature scaling.It is noted that almost current deep LLIE methods are based on U-Net equipped with multiple feature scaling operations.
To address the above issue, we propose a full-resolution representation (FRR) strategy that removes all feature scaling operations to get rid of information loss.The process of illumination enhancement is formulated as follows: where CA i (•) and ϕ i (•) denote the transformation in the i-th CA module and convolutional layer respectively, f i denotes the intermediate feature maps, S in and S pre are the input lowlight image and predicted norm-light image respectively.Note that all intermediate feature maps remain the same resolution as the input low-light image.As a result, losing visual details caused by feature scaling is solved, which is beneficial for more effective illumination estimation and accurate texture recovery.
2) Context Attention (CA): Simple architecture makes the overall structure shallow, which results in a limited receptive field.This further enables the model to be incapable of effectively capturing global contextual information, thereby making the enhanced details inaccurate.We therefore propose the concept of CA to alleviate this issue.Note that we have three requirements for the design of CA: • Global Context Extraction: Due to the limited receptive field caused by shallow structure, global context information cannot be effectively extracted.Thus, the first task of CA is to extract global contextual representation.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
• Local Detail Preservation: Besides, the local details are conducive to texture-detail recovery are ignored.Hence, we hope that CA can capture and preserve local information for better enhancement.• Plug-and-Play: In addition to the above two requirements, we also want CA to be plug-and-play.That is, CA can be deployed anywhere on any network.To satisfy these requirements, we develop CA module (see Figure 2) with a dual-path design, i.e., context path and detail path, which is originally inspired by [46].To be specific, context path is for extracting global contextual information the detail path is for preserving local details in images.
Context Path: To make context path have the ability of extracting global contextual information, we first use a convolutional layer with kernel size of 1 to change the channel dimension and obtain the transformed feature x t c for context acquirement, as can be seen in Figure 2. The learning process can be formulated as follows: where x denotes the input feature and ϕ t c (•) denotes the transformation of used convolutional layer.For capturing the global contextual information hidden in the darkness, CA module aggregates the information via two steps.In the first step, we merge the spatial information by computing the mean values for channel dimension of the transformed feature x t c .In the second step, we fully congregate the information among all channels via a fully connected layer, which is implemented by a simple convolutional layer ϕ m c with kernel size of 1.The above two-step process can be formulated as: where x m c denotes the global information aggregated feature, GP(•) denotes the global average pooling operation for computing the mean values.At last, x c c and the transformed feature x t c are further combined to obtain the context feature representation, which can be expressed as follows: where is element-wise multiplication, Sig(•) denotes the sigmoid function and x c represents the extracted global context feature.Note that x m c is expanded to the same size of x t c .Detail Path: The goal of detail path is to retain local details, which would be beneficial for more accurate texture and detail recovery.We use two convolutional layers to handle the original input feature x to obtain the feature representation x d of local details, which is performed as follows: where ϕ d denotes the transformation of two cascaded convolutional layers.Specifically, the first convolutional layer is used to transform the original input feature x for the processing of detail path, and the second convolutional layer is utilized for extracting and retaining local detailed information.
Informative Feature: After obtaining the above feature representations for global context and local details, we further calculate the weighted summation of them, which is treated as the final informative feature representation.The summation process can be formulated as follows: where x i denotes the informative feature representation obtained by CA, which contains rich global context and local detailed information for accurate recovery.Plug-and-Play: Based on the above designs, CA module is plug-and-play and meets all wanted demands.In other words, CA can be deployed flexibly into other networks for learning global context and retaining local details simultaneously.

B. Loss Function
We only employ two simple losses to define the total loss function L of our FRC-Net for deep LLIE, i.e., structure similarity (SSIM) loss L ssim and total variation (TV) loss L tv : where λ tv is a trade-off parameter.Specifically, L ssim is utilized for reconstructing the illumination-enhanced image, while L tv is regarded as an regularization term.

A. Experimental Settings
We evaluate the LLIE performance of our FRC-Net on widely-used datasets, along with illustrating the visual and quantitative comparison results with closely-related methods.
Evaluated Datasets: Two widely-used paired datasets are involved, i.e., LOL dataset [20] and VE-LOL dataset [4].LOL dataset contains 485 paired training images and 15 paired testing images.VE-LOL dataset is composed of 400 paired training images and 100 paired testing images.Note that all images of these datasets are captured in real environment.For training, we only use the LOL training dataset.For testing, we conduct the LLIE task on both LOL testing set and VE-LOL testing set.Besides, we also evaluate each model on two unpaired datasets, including DICM [47] and MEF [48], to examine the generalization ability on real-world scenario.
Evaluation Metrics: To fully evaluate the LLIE results of different methods, we utilize five image quality evaluation metrics, i.e., peak signal-to-noise ratio (PSNR), structural similarity (SSIM), mean absolute error (MAE), multi-scale structural similarity (MS-SSIM) and naturalness image quality evaluator (NIQE).Note that PSNR, SSIM, MS-SSIM and MAE are full-reference metrics, while NIQE is non-reference metric that can demonstrate the naturalness of the illuminationenhanced image.The smaller MAE and NIQE, the more realistic the refined images.In contrast, the greater PSNR, SSIM and MS-SSIM, the better quality of the enhanced image.
Implementation Details: We conduct all experiments by using the Pytorch platform on Python environment with one NVIDIA GeForce 2080ti GPU.All training images are randomly cropped into 256×256 pixels.Adam optimizer is utilized with a batch size of 4. We train our FRC-Net for 1000 epochs, where the learning rate is initially 0.0001 and by 10 percent per 100 epochs.For the hyper-parameters of our FRC-Net, we empirically set λ tv = 0.1.Note that we provide three variants of FRC-Net, i.e., FRC-Net-B (base), FRC-Net-S (small) and FRC-Net-T (tiny) by using different feature channels.Specifically, each inner feature of FRC-Net-S/FRC-Net-T will have half/quarter of channels, compared with FRC-Net-B.Several related and popular methods, i.e., RetinexNet [20], Kind++ [21], EnGAN [22], Zero-DCE [23], Kind [24], Zero-DCE++ [25], are compared.

B. Quantitative Results
Evaluation on LOL Dataset: We first evaluate each LLIE method on the LOL dataset.The numerical evaluation results are shown in Table I.We can find that the overall performance of our FRC-Net is superior to other compared methods.Specifically, our FRC-Net obtains the greatest PSNR value and smallest MAE value, which suggests that our method can better prevent from artifacts and reconstruct the normal-light image in pixel level; The better SSIM value and MS-SSIM value demonstrate that the FRC-Net is capable of recovering the hidden structure in the darkness; The best NIQE value of our FRC-Net means that the enhanced image of our method is more natural than others.
Evaluation on VE-LOL Dataset: We also evaluate the quantitative results on VE-LOL dataset.Table I shows the numerical results.We find that: (1) similar to LOL dataset, our FRC-Net obtains the best results for all metrics; (2) compared to LOL dataset, there is significant performance degradation for other LLIE algorithms.On the contrary, our method has obvious performance improvement on VE-LOL dataset, which shows the strong generalization ability of our method.

C. Visual Enhancement Results
Visualization on the LOL Dataset: We first display the visual result of each method in Figure 3.We see that:   produce over-enhancement images, compared with the ground truth image.Focusing on the region in the red rectangular box, KinD and KinD++ cannot well recover the details of yellow billboard and green tree outside the window; (3) our FRC-Net obtains the best illumination-enhancement results in terms of structure perception and local detail preservation.
Visualization on the VE-LOL Dataset: The visual results on VE-LOL dataset are shown in Figure 4. We can see that RetinexNet generates inaccurate colors in the enhanced image.Besides, there are a lot of noise in the enhanced images of Zero-DCE and Zero-DCE++.For KinD and EnGAN, the illumination of enhanced results is still weak.It seems that KinD++ obtains pleasant enhancement result.However, compared to the ground truth, we find that KinD++ fails to restore the slight brownish light environment.Overall, our FRC-Net obtains the most accurate and realistic enhancement results.

D. Generalization on Real Unpaired Image Enhancement
To further evaluate the ability of each model to handle realworld low-light images, we also use two unpaired real-world low-light image datasets, i.e., MEF and DICM. Figure 5 shows the enhanced results for each method.We can see that: (1) there is abnormal color in the enhanced image of RetinexNet; (2) the results of EnGAN, Zero-DCE and Zero-DCE++ contain more noise; (3) KinD and KinD++ are not capable of adjusting the illumination to show the contents hidden in darkness; (4) in contrast, our method can well enhance the low-light image, without obvious noise and inconsistent colors.

E. Comparison of Trainable Parameters and Inference Time
We also compare the amount of trainable parameters and inference time of each method.The results in terms of PSNR vs. trainable parameter and inference time on LOL dataset are shown in Figure 6.As can be seen, Zero-DCE and Zero-DCE++ are the two smallest and fastest models, but with the worst performance.KinD, KinD++ and EnGAN are the three biggest models, with slow inference speed.Middleweight RetinexNet also needs longer inference time.In contrast, our FRC-Net obtains the best enhancement result, with relatively fewer parameters and faster inference speed.

F. Ablation Study
We mainly evaluate the effects of the proposed FRR strategy and CA module.The visual and numerical analysis are described in Table II and Figure   (1) Effect of FRR strategy.According to Table II, when we remove the FRR strategy from our FRC-Net, i.e., Baseline+CA model, there is obvious performance degradation in terms of PSNR, SSIM and MAE.If we add the FRR strategy to Baseline model, denoted as Baseline+FRR model, the performance can be improved greatly.From Figure 7, we can find that the enhanced image of Baseline+CA model contains discordant green colors.The above changes suggest that the proposed FRR strategy is able to retain more informative visual primitives caused by feature scaling, which is conductive to recover the details for LLIE.
(2) Effect of CA module.When we remove the CA module from FRC-Net, i.e., Baseline+FRR produces inconsistent colors, as shown in Figure 7.According to the numerical results in Table II, the quality of the enhanced images is better when CA module is used.Because CA module can capture global contextual information and retain local details simultaneously, which can guide LLIE model to perform more accurate illumination adjustment and detail recovery.

V. CONCLUSION
In this paper, we present a simple yet effective network FRC-Net for LLIE.Technically, we proposed a novel fullresolution representation strategy to replace all the feature scaling operations in U-Net architecture, so that FRC-Net can get rid of the information loss for more accurate detail and texture recovery.To overcome the limited receptive field caused by shallow structure and obtain global context features, we further develop a new context attention module.Based on the dual path design, context attention is able to extract contextual information and preserve local details, which can guide FRC-Net to complete more accurate illumination adjustment and detail restoration.Extensive experiments show that FRC-Net achieves state-of-the-art performance, with relatively fewer parameters and faster inference speed.In the future, we will study more efficient and effective models for LLIE.Evaluating our FRC-Net and CA module on other computer vision tasks is also an interesting future work.

Abstract-Low-light image
enhancement (LLIE) aims at refining illumination and restoring the details of low-light images.However, current deep LLIE models still face two crucial issues causing blurred textures and inaccurate illumination: 1) lowquality detail-recovery results due to information loss; 2) complex and even redundant model structure.In this paper, we therefore propose a simple yet effective deep LLIE architecture, termed Full-Resolution Context Network (FRC-Net).To avoid the visual information loss caused by feature scaling, we present a novel full-resolution representation strategy to replace all feature scaling operations, which can prevent the information degradation by making the intermediary features keep the original resolution.The structure of FRC-Net is very simple, which only contains 12 cascaded layers: 7 convolutional layers and 5 newly-designed context attention (CA) modules.The plug-and-play CA module is designed to overcome the limited receptive field caused by shallow structures by learning global context as well as retaining local details.Extensive experiments show that our model obtains better detail-recovery quality over current SOTA methods, with relatively fewer parameters and faster inference speed.Index Terms-Low-light image enhancement, simple yet effective, full resolution, context attention.

Manuscript received 28
September 2022; revised 22 December 2022 and 27 February 2023; accepted 18 May 2023.Date of publication 29 May 2023; date of current version 26 April 2024.This work was supported in part by the National Natural Science Foundation of China under Grant 62072151, Grant 61932009, and Grant 62020106007; in part by the Anhui Provincial Natural Science Fund for the Distinguished Young Scholars under Grant 2008085J30; and in part by the CAAI-Huawei MindSpore Open Fund.(Corresponding authors: Richang Hong; Jicong Fan.)

Fig. 2 .
Fig. 2. The simple architecture of our FRC-Net, which includes seven convolutional layers and five plug-and-play CA modules, without feature scaling operations.The structure of CA module is shown in the left bottom, which has two paths: context path is for global context extraction and detail path is to preserve the local details.Some notations are also listed in the right bottom.
Simple Yet Effective Architecture: Technically, we propose a new, simple yet effective deep LLIE model called Full-Resolution Context Network (FRC-Net).As seen in Figure 2, the structure of our FRC-Net is based on a plain architecture with only twelve cascaded layers (i.e., seven convolutional layers and five context attention modules).By designing and incorporating the full-resolution representation strategy and context attention module, our FRC-Net can overcome the shortages caused by feature scaling and shallow structures.To the best of our knowledge, this is one of few works to enhance the illumination of low-light images by exploring and purely utilizing the full-resolution feature representation.• Full-Resolution Representation (FRR) Strategy: To retain useful information, we present a novel and effective FRR method, which can remove all feature scaling operations in U-Net.As a result, all intermediary feature maps keep the same resolution as input image without information loss.Attributed to the strong information retention ability of the FRR strategy, our FRC-Net can effectively address the issue of losing useful visual information caused by the feature scaling operations.• New Context Attention (CA) Module: The simple structure of FRC-Net forces it to handle the LLIE task using shallow structures.However, limited receptive field is a problem caused by simple and shallow neural networks.As a result, FRC-Net may lack global contextual information, further making the enhanced image contain unclear textures and inaccurate illumination.Hence, we present a plug-and-play CA module for jointly extracting global contextual features and preserving local detail information in the LLIE process.• Better Illumination Enhancement: Extensive experiments show that our FRC-Net can better enhance the illumination and obtain state-of-the-art results, with relatively faster inference speed and fewer parameters compared against other popular solutions.

Fig. 3 .
Fig. 3. Visual comparison (PSNR/SSIM) on an image of LOL dataset.Our model better recovers the details (e.g., yellow billboard and green trees) in the red rectangle among all competitors.TABLE I EVALUATION RESULTS IN TERMS OF PSNR, SSIM, MAE, NIQE AND MS-SSIM ON THE LOL AND VE-LOL DATASETS.BOLD DENOTES THE BEST, AND UNDERLINE INDICATES THE SECOND BEST.IT IS CLEAR THAT OUR FRC-NET ACHIEVES SOTA PERFORMANCE AMONG ALL COMPARED METHODS

( 1 )
the enhanced image of RetinexNet contains obvious noise and inconsistent colors; (2) Zero-DCE and Zero-DCE++ Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 4 .
Fig. 4. Visual comparison (PSNR/SSIM) on an image of VE-LOL dataset.Our FRC-Net performs better illumination estimation and content recovery.

Fig. 5 .
Fig. 5. Visual comparison on real-world unpaired datasets, where the first low-light image belongs to MEF dataset and the second one comes from the DICM dataset.Clearly, our FRC-Net can better adjust the illumination and restore the details hidden in the darkness.
7, where Baseline model is based on U-net architecture with twelve convolutional layers, two feature downscale and upscale operations.Baseline+CA model denotes the refined Baseline model with five convolutional layers instead of five CA modules.Baseline+FRR model denotes Baseline model without feature scaling operations.

Fig. 7 .
Fig. 7. Visual analysis of ablation study.There are inconsistent colors in the results for Baseline, Baseline + CA and Baseline + FRR, which demonstrates the effectiveness of the developed FRR strategy and CA module.
FRC-Net: A Simple Yet Effective Architecture for Low-Light Image Enhancement Zhao Zhang , Senior Member, IEEE, Huan Zheng, Student Member, IEEE, Richang Hong , Senior Member, IEEE, Jicong Fan , Yi Yang , Senior Member, IEEE, and Shuicheng Yan, Fellow, IEEE

TABLE II NUMERICAL
RESULTS (PSNR/SSIM/MAE) OF ABLATION STUDY ON LOL/VE-LOL DATASETS.WE CAN FIND THAT THE PERFORMANCE IS DEGRADED WHEN WE REMOVE THE FRR STRATEGY AND CA MODULE FROM OUR FRC-NET Fig. 6.PSNR vs. trainable parameters and inference time on LOL dataset.Our method can achieve best performance with relatively fewer parameters and faster inference speed.