Difference-Deformable Convolution With Pseudo Scale Instance Map for Cell Localization

Cell localization still faces two unresolved challenges: 1) the dramatic variations in cell morphology, coupled with the heterogeneous intensity distribution of lightly stained cells; 2) existing cell location maps lack scale information, resulting in insufficient supervision for point maps and inaccurate supervision for density maps. 1) To address the first challenges, we introduce a novel gradient-aware and shape-adaptive Difference-Deformable Convolution (DDConv), which enhances the model's robustness to color by leveraging gradient information while adaptively adjusting the shape of the convolutional kernel to tackle the substantial variability in cell morphology. 2) To overcome the issue of unreasonable location maps, we propose the Pseudo-Scale Instance (PSI) map, which can adaptively provide the corresponding scale information for each cell to realize accurate supervision. We analyze and evaluate DDConv and the PSI map in three challenging cell localization tasks. In comparison to existing methods, our proposed approach significantly enhances localization performance, setting a new benchmark for the cell localization task.

broad applications in biological research [1].Previous studies have made significant progress.For example, Alam et al. [2] use a modified You Only Look Once (YOLO) network to automate the detection and counting of red blood cells, white blood cells, and platelets.Huang et al. [3] design a network based on Congested Scene Recognition Network (CSRNet) that regresses density maps to locate positive and negative cells in breast cancer pathological sections.However, two challenges remain unaddressed, which significantly affect the accuracy of cell counting.
The large variability in cell morphology, coupled with the heterogeneous intensity distribution of lightly stained cells, presents the first unsolved challenge, as shown in Fig. 1(a).In order to tackle the challenge of disparities in morphology, Tofighi et al. [4] propose a Tunable Shape Prior Convolutional Neural Network (TSP-CNN).TSP-CNN incorporates shape priors, which are customized to match the intricate and diverse cell shapes in images.However, the fixed shape priors of this approach restrict its applicability to other scenarios.To address the issue of the heterogeneous intensity distribution of lightly stained cells, Li et al. [5] propose a multi-scale difference convolution module.This approach enhances the model's robustness to cell color in images.However, difference convolution amplifies the edge information of cells in the feature map, which to some extent exacerbates the interference of cell shape to the model.Current methods aim to address cell morphology and staining heterogeneity separately but face significant challenges, limiting accurate cell localization and counting.Hence, accurately identifying and localizing lightly stained and morphologically diverse cells remains an unexplored area with the potential to improve cell localization and counting performance.
Unreasonable location maps are the second unsolved challenge.Typically, existing location maps can be roughly divided into two categories: Density maps [3] and Point maps [6], [7].Density maps reflect the density of cells in different regions, and point maps are binary images that contain disks with cell centers as shown in Fig. 1(b) and (c).Although the two location maps show effectiveness in cell localization and counting, both of them come with several clinical drawbacks.Regarding the density map, it primarily involves two key issues: 1) Inability to avoid the overlapping challenge.Density maps generated based on Gaussian convolution inevitably encounter the issue of overlap in dense regions, resulting in the model being guided in the wrong direction, ultimately affecting its coverage and accuracy.
2) Complex post-processing.Density maps need to be calculated by the local maxima algorithm to obtain the specific location of each cell.When cells are not uniformly distributed, it is difficult to define the range for calculating the local maxima, resulting in some cells being ignored.In contrast, point maps use very small disks to represent cells.Although point maps avoid the overlapping challenge, which also brings several drawbacks.1) Extreme imbalance.In point maps, there is an extreme imbalance between point pixels and background pixels.The large negatives can be overwhelming and make up the majority of the loss, resulting in the gradient being dominated by large negative values and ignoring the cell information.2) Lacking cell scale information.Point maps can hardly represent cell scale information, which makes the model lose sensitivity to cell size and reduces the performance of the model.Hence, there is an urgent need for an accurate and reasonable location map manner.
To overcome the challenge of the large variability of cell morphology and heterogeneous intensity distribution, we propose a novel Difference-Deformable Convolution (DDConv) (Fig. 2).DDConv is gradient-aware and shape-adaptive, which enables the model to focus on the gradient information of cells for extracting the edge information of lightly stained cells and meanwhile adaptively adjust the shape of the convolutional kernel for overcoming the challenge of the large variability in cell shape.Specifically, DDConv employs eight filters for feature representation learning, and each of the filters is used to calculate the differences in one of the eight directions.This operation preserves neighboring activation difference information for determining edges and corners and makes DDConv gradient-aware.Furthermore, to enable filters to handle the large variability in cell morphology, we adopt deformable filters inside DDConv.Based on the advanced deformable filters, our DD-Conv is adaptively adjusted according to the cell's morphology during feature representation learning.
To address the issue of unreasonable location maps, we introduce a novel concept called the Pseudo-Scale Instance (PSI) map.As illustrated in Fig. 2, within the PSI map, each cell is treated as an individual connected circular domain with a scale-related radius.The PSI map dynamically computes the scale information and associates it with the annotation of each cell.In comparison to existing location maps, our innovative PSI map offers two notable advantages: 1) Computational efficiency: Instead of relying on uncertain probability distributions, PSI assigns a specific scale value to each cell.This approach effectively mitigates computational challenges encountered in density maps.2) Scale awareness: The utilization of scale information in PSI enhances the model's sensitivity to the size and shape of diverse cells.This, in turn, helps prevent overlapping issues and mitigates extreme imbalances in the data.Overall, our PSI map represents a valuable enhancement in constructing location maps by incorporating scale information, addressing Fig. 2. Our novel DDConv is gradient-aware and shape-adaptive, which enables the model to focus on the gradient information of cells for extracting the edge information of lightly stained cells and meanwhile adaptively adjust the shape of the convolutional kernel for overcoming the challenge of the large variability in cell shape.Our novel method constructs location maps by leveraging scale information, offering two advantages: 1) Computational efficiency.2) Scale awareness.computational efficiency concerns, and fostering scale awareness in the model.
To assess the advancements brought by DDConv and PSI, three challenging cell localization datasets are utilized for thorough comparisons and evaluations [3], [8], [9].Extensive experimental results demonstrate that the scale information provided by our PSI contributes significantly to the performance improvement in localization tasks.Further, to validate our DD-Conv module, we introduce the DCLNet model built upon this module.Comprehensive comparative experiments and ablation studies indicate that this model outperforms the state-of-the-art methods.Our primary contributions encompass: 1) Our novel DDConv is gradient-aware and shape-adaptive, which enables extracting the edge information of lightly stained cells for overcoming the challenge of the heterogeneous intensity distribution in lightly stained cells and meanwhile adaptively adjusting the shape of the convolutional kernel for overcoming the challenge of the large variability in cell shape.2) Our new PSI map enables adaptively computing the scale information and associating it with each cell's annotation, which addresses the computational inability challenge in density maps and advanced makes the model sensitive to the size of cells.3) Due to the synergistic integration of the PSI map and DDConv, our novel approach has outperformed existing methods and set a new performance benchmark.

II. RELATED WORKS
In this section, we briefly describe the current research status of cell localization and counting using CNN, mainly including methods based on detection, density map, and point map.Additionally, related work on deformable convolution is also reviewed.

A. Detection-Based Methods
Detection-based cell localization methods typically generate multiple candidate regions in the image and then classify and filter each region to identify those that truly contain cells [10], [11].Typically, Alam et al. [2] use a modified YOLO network to automate the detection and counting of red blood cells, white blood cells, and platelets.Ma et al. [12] propose an abnormal cell detection network based on Mask R-CNN, which integrates different features using an attention mechanism to improve detection performance.Du et al. [13] design a method based on Retinanet in the state of Super Depth of Field (SDoF) to achieve high precision detecting of leucorrhea components by the SDoF feature aggregation module.
Detection-based methods perform well in scenarios with sparse target distribution.However, the performance tends to degrade as cell density increases.Moreover, bounding box-level annotation is often relatively expensive.Therefore researchers now commonly use point-based annotation methods.

B. Density Map-Based Methods
To better utilize the spatial information, most current cell localization and counting works are based on density maps.For instance, Pan et al. [14] introduce a multiscale fully convolutional neural network for density map regression.The network can detect small single cells as well as large and overlapping cells.To address the scarcity of datasets in the cell localization and counting field, Sirinukunwattana et al. [15] propose a spatially constrained convolutional neural network and release a dataset named UW.Recently, Huang et al. [3] propose a large-scale dataset called BCData for cell counting, localization, and classification, and design a network based on CSRNet that regresses density maps.
These works have promoted the development and application of cell localization and counting to some degree.However, density map-based methods cannot effectively use the scale information of cells.In some irregularly shaped cells, a cell may be marked with multiple localization points, resulting in more false positives.Therefore, some researchers begin to solve cell localization and counting by generating pseudo-segmentation maps (point maps).

C. Point Map-Based Methods
Nowadays, some researchers generate point maps based on point labels to better utilize the scale information of cells in images.For example, Hagos et al. [6] propose an Inception-v3-based neural network and use point maps to supervise its training.Raza et al. [7] combine point labels with mapping filters to generate artificial pseudo-labels for training convolutional neural networks.
Compared with density maps, point maps can reflect the scale information of cells to some extent.However, most existing point maps use circles of uniform size to represent each cell.Since the size of each cell in real cell images varies, it is highly unreasonable to use circles of the same size to represent all cells.In addition to manually generating pseudo-labels, some researchers [8], [16] have used instance segmentation to simultaneously perform cell localization in cell classification tasks.However, instance segmentation annotation is often expensive and does not provide labels directly related to cell localization and counting.Consequently, using instance segmentation datasets for cell localization and counting is not cost-effective when cell classification is not needed.

D. Deformable Convolution
In the field of image segmentation [17], [18], the target object is often irregular in shape.Fixed-shaped convolution kernels tend to perform poorly on such targets.To achieve more accurate segmentation results, researchers introduce deformable convolution [19] to segmentation tasks.Generally, Huang et al. [20] design a feature alignment module based on deformable convolutions.This module learns pixel offsets and inserts them into the FPN structure to contextually align upsampled features.In medical image segmentation, there are also related studies on deformable convolution.Xie et al. [21] propose a feature fusion module for medical image segmentation.The module consists of feature attention selection, cross-offset generation, and deformable convolution layers to alleviate the ambiguous semantic information between the encoder and decoder.Furthermore, deformable convolution has also been used in cell detection.Li et al. [22] insert deformable convolution into the FPN structure and extend the Faster R-CNN model for automatic detection of cervical squamous epithelial cells in liquid-based cytology.
The emergence of deformable convolution has effectively improved the problem of misaligned contextual features in segmentation tasks, thereby improving the accuracy of segmentation results.In this paper, we introduce deformable convolution into the field of cell localization and combine it with difference convolution [23].

III. METHOD
In this section, we detail the DDConv and the generation of the PSI map.During the training phase, we design a network called DDConv-based Cell Localization Network (DCLNet) to learn the mapping relationship from cell images to the PSI maps.To optimize the model parameters, the loss between the output image and the PSI map is calculated.

A. Difference-Deformable Convolution
Our novel Difference-Deformable Convolution (DDConv) is gradient-aware and shape-adaptive, which enables the model sensitive to lightly stained cells and meanwhile adaptively adjust the offsets of the convolutional sampling points.Specifically, various cells have diverse shapes, which causes deviations in shape from the circular annotations.Additionally, it is inevitable to have lightly stained parts of the cell due to variations in staining techniques, scoring methods, and selection of scoring regions.Lightly stained cells are the bottleneck restricting the accurate detection of cells due to low contrast and blur boundaries.Although existing convolution methods can extract high-level semantic features from cells, they cannot capture cells with irregular shapes due to the intrinsic locality of the convolution operator and lack of sensitivity to lightly stained cells.
To overcome the above challenges, a novel Difference-Deformable Convolution is proposed.As shown in Fig. 3, our DDConv adaptively adjusts the offsets of the convolutional sampling points during feature extraction and enables extracting gradient information of cell edges [24] for overcoming the challenge of lightly stained cells.Formally, the vanilla convolution can be represented as where p 0 denotes the central position of the local receptive field R, p n represents the relative position of each value from R to p 0 , and w(p n ) is a learnable parameter.x and y is the input and output feature map respectively.The definition of DDConv can be represented as Compared with vanilla convolution, our DDConv enables adjusting the scope of the convolution operation through a learnable parameter Δp n , which is updated by back-propagation during the training process.Δp n is generated by convolving the input feature map with another convolution (central difference here) and it is usually a decimal number.After adding Δp n , the sampling is on the irregular and offset locations p n + Δp n .As the offset Δp n is typically fractional, x(p 0 + p n + Δp n ) is implemented via Bilinear Interpolation (BI): where p denotes the fractional location (p = p 0 + p n + Δp n ), q spread out all integral spatial locations in the input feature map x.The formula max(0, 1 − • • • ) is the restriction that the interpolated point will not be more than 1 pixel away from the domain point.After BI, p is uniquely determined.
In the DDConv, Δp n is generated by the central difference convolution, which can be expressed as That is, each value x(p 0 + p n ) in the local receptive field R is subtracted from its centroid x(p 0 ) to form local gradient information.Meanwhile, considering that the vanilla convolution can bring stronger semantic information, the offset calculation can be expressed as where θ is a hyperparameter that controls the ratio between vanilla convolution and difference convolution.The results of applying vanilla convolution and DDConv to cell images are shown in Fig. 2. It can be observed that the segmentation results obtained by applying DDConv are closer to the annotated images than those obtained by vanilla convolution.

B. Pseudo Scale Instance Map
To address the computational inability challenge in existing maps and advanced makes the model sensitive to the scale information of cells, we propose a new cell location map, called Pseudo Scale Instance (PSI) map.Compared with point maps that use fixed-radius circles to represent each cell, PSI maps use circles of different sizes to represent each cell.Since the annotation of the instance segmentation dataset includes the boundary of each cell, it is easy to obtain the scale information of each cell and generate PSI maps accordingly.We first generate PSI maps using an instance segmentation dataset and perform a preliminary experiment to validate the effectiveness of the PSI map.The process of generating PSI maps using an instance segmentation dataset is shown in Fig. 4(a).Then, considering that existing cell localization datasets [3], [4], [15] often do not contain information about the scale of cells, which limits the further promotion of PSI map.To address this issue, we aim to add scale information to existing point-annotated datasets.Moreover, the annotation cost of instance segmentation datasets is often very expensive, while point labels are much easier to obtain.To this end, a scale-giving method that can provide scale information for datasets without scale information is introduced, as shown in Fig. 4(b).
1) Scale Validation: According to Fig. 4(a), the Instance extraction & Distance transform analysis is performed to generate PSI maps from an instance segmentation dataset.Then, a preliminary experiment is deployed to verify the performance enhancement of the PSI map, which is shown in Section IV-E.
Instance extraction & Distance transform analysis: In this step, each instance in an image is extracted from the annotation and represented using circles of different sizes, with position labels added to generate PSI maps, as shown in Algorithm 1.The total number of instances in an image is denoted as N, and each instance is saved as a separate image, referred to as ins map.Therefore, one cell image corresponds to N ins maps.For each ins map, the distanceTransform function in OpenCV is performed on the unique connected domain it contains, obtaining the centroid of the domain and the set of distances from the centroid to the boundaries of the connected domain.The instance segmentation image, distance map, and corresponding heatmap are shown in Fig. 4. Finally, a circle is drawn on the PSI map using the centroid as the center and the maximum value in the set of distances as the radius, which can be obtained by the minMaxLoc function in OpenCV.After iterating through all N ins maps, the PSI map corresponding to the image can be obtained.
2) Scale-Giving Method: According to Fig. 4(b), the Contours detection and the Location analysis are deployed to introduce the scale information to existing cell localization datasets without scale information.
Contour detection: Firstly, it is significant to detect the contour of each cell to obtain the scale information.There are many methods for boundary detection, such as Sobel, Laplace, Canny, Algorithm 2: Location Analysis.and other operators.In order to be able to detect both lightly stained and well stained cells, we use a difference convolutionbased network [25] to extract cell contours.
Location analysis: After finding the contour of each cell, the PSI map is generated based on the contours and the point annotation.Specifically, we calculate the distance between each point and contour by using the pointPolygonTest function in OpenCV.If the distance is greater than 0, the point and contour are supposed to be successfully matched.Then, the corresponding PSI map is generated through location analysis, as shown in Algorithm 2. Suppose there are M annotation points in an image.We traverse these M points and calculate the half distance between the currently traversed point and the nearest point, denoted as min dist.If the current point is matched with a contour, the distance is calculated from the current point to the matched Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.contour, which is denoted as dist.Using the current point as the center of the circle, a circle is drawn on the PSI map with a radius equal to the smaller value between dist and min dist.This circle represents the corresponding cell at that coordinate, and the radius of the circle reflects the cell's scale information.If the current point doesn't match any contour, it means that the corresponding contour hasn't been well detected.Then, the average radius of all the circles corresponding to the other cells is signified as dist, and the smaller value between dist and min dist equals the radius of the circle.

C. DCLNet for Cell Localization
Based on the advance of DDConv, we propose a new cell localization network, called DCLNet, as shown in Fig. 5. DCLNet mainly consists of a backbone and two DDConv.The backbone consists of 4 stages and achieves both strong semantic information and accurate location information by parallelizing multiple branches of resolution and continuously interacting with information between different branches.The reason for choosing the decoder instead of the encoder is that the decoder is mainly responsible for patching up the image.The decoder restores high-dimensional features to low-dimensional images, perfects the geometric shapes of objects in the process, and compensates for the detail loss caused by the pooling layers in the encoder, making the results closer to the annotation.The network structure diagram is shown in Fig. 5.The cell image is first passed through two 3 × 3 convolutions to obtain a feature map of size H 4 × W 4 , which is then fed into the backbone.Subsequently, the backbone outputs four feature maps with sizes of , and H 32 × W 32 , respectively.Then, the two smaller feature maps are sent to the DDConv separately, and all feature maps are upsampled to the size of H 4 × W 4 and concatenated in the channel dimension.Finally, the feature map is reduced to one channel and the size is restored to H × W through the decoder, and the location map is obtained after post-processing.

A. Datasets and Implementation
1) Datasets: In the experiment, we convert the annotation information of three datasets, including an instance segmentation dataset and two point annotation dataset, into PSI maps.We briefly describe these two datasets as follows.
The nuclei grading dataset [8] consists of 1,000 Haematoxylin and Eosin (H&E) stained images with a resolution of 512 × 512.This dataset contains 70,945 annotated cell nuclei, including 16,652 endothelial nuclei and 54,293 tumor nuclei.The tumor regions are selected by two experienced pathologists from 150 ccRCC and 50 pRCC WSIs and scanned at 40× objective magnification.In this dataset, the training, validation, and testing sets contain 700, 200, and 100 cell images, respectively.
BCData [3] is a large Ki-67 staining dataset for cell localization and counting, containing 1,338 breast tumor cell images.The original WSIs are scanned at 40× magnification (∼ 0.2239 microns/pixel).The cropped images have a uniform resolution of 640 × 640, with a total of 181,074 annotated cells.In this dataset, there are 803, 133, and 402 images in the training, validation, and testing set, respectively.It is worth mentioning that this experiment is implemented based on https://openi.pcl.ac.cn/xuf01/ki67, represented as U-CSRNet ∝ in Table VI.
CoNIC [9] is the current largest publicly available nuclei-level dataset in computational pathology.It contains around half a million labeled nuclei.The dataset consists of H&E stained histology images at 20× objective magnification (∼ 0.5 microns/pixel) from 6 different data sources.For each image, an instance segmentation and a classification mask is provided.To make the cells in a single image of the three different datasets similar in size, we cropped the datasets to 256 × 256 resolution images.The dataset is split into 4,807 images (images without cells are not included).We set 2,837 images from the CoNIC dataset as the training set, 501 images as the validation set, and 1,469 images as the test set.
2) Implement Details: During the training process, random horizontal flipping and random scaling are implemented.The scaling ratio ranges from 0.8 to 1.2.The experiments are conducted on an Nvidia GeForce RTX 3090 (∼ 24 GB), with a batch size of 4 and a learning rate set to 1e-4.After 200 iterations, the learning rate is decayed to 1e-5, and the total epoch is set to 800.The AdamW optimizer is used to optimize the network.To train the proposed network, the standard mean squared error loss function is chosen in the experiment.
3) Evaluation Metrics: To evaluate the performance of cell localization and counting, separate metrics for localization and counting are required.A match is considered successful when the distance between the given predicted point and the true point is less than a threshold value.In this paper, the threshold is set to the radius of each mask in PSI maps.
Localization Metrics: To accurately evaluate the matching relationship between predicted cell points and ground truth, we use F1 score, precision, and recall to assess the localization performance of the model.They are defined as where TP, FP, and FN represent the values of true positive, false positive, and false negative.
Counting Metrics: In this paper, instead of directly regressing the number of cells, we obtain the results by counting the connected regions in the output image.The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are used to evaluate the counting performance of the model, which are defined as where N represents the total number of samples in the validation or test set, y gt i is the ground truth boundary, and ŷi is the predicted value for the ith sample.

B. Comparison With State-of-the-Art Methods
Experimental results on the BCData dataset, nuclei grading dataset, and CoNIC dataset from two aspects, quantitative and qualitative performances.Table I shows the quantitative result of DCLNet and state-of-the-art methods.From Table I, we can notice that our DCLNet achieves the best localization performance on all datasets, which demonstrates that our DCLNet has a more effective cell feature representation learning ability.Fig. 6 demonstrates several visualization results of images in BCData and the nuclei grading dataset generated by various methods.From Fig. 6, we can notice that with the utility of the PSI map, all methods can mostly overcome the challenges of low sensitivity in the scale of cells.However, existing state-of-the-art methods still suffer from the challenge of ignoring lightly stained cells, which leads to a decline in localization performance.Compared with state-of-the-art methods, our DCLNet obtained more accurate cell localization, which demonstrates that DDConv is sensitive to cells with various shapes.Additionally, regardless of whether the cell images are well stained with Ki-67 or H&E, DCLNet almost always maintains the best performance, demonstrating the robustness of the DDConv to lightly stained cells.

C. Ablation Studies on DDConv 1) Comparison With Other Convolutions:
To demonstrate the superiority of our novel DDconv, we also compare DDconv with other convolutions including vanilla convolution and deformable convolution.Fig. 7 shows the location maps generated by baseline with vanilla convolution, deformable convolution, and DDconv.In addition to the qualitative comparisons, Table II shows the quantitative comparison of the baseline with three convolutions on the CoNIC dataset and BCData.
From Fig. 7, two points are summarized.1) Deformable convolution and DDConv can make the cell shapes closer to the ground truth, which addresses of challenge of variations in cell shapes.2) Compared to deformable convolution, the DDConv can better optimize the cell shapes and detect more lightly stained cells at the same time.It can also be seen from the  To investigate the effect of the number and position of the DDConv on localization and counting performance, an ablation experiment is performed on the number and position of the DDConv on BCData.The backbone outputs four feature maps of different sizes.According to the feature map size from large to small, the branches are designated as stage 0, stage 1, stage 2, and stage 3, respectively.The experimental results are listed in Table III.It can be seen that the best localization and counting performance is achieved when the DDConv is added to stage 2 and stage 3.
When the hyperparameter θ in ( 5) is 0, the difference convolution degenerates into the traditional vanilla convolution, which leads to the loss of relative gradient information.Therefore, an ablation study on θ is carried out to explore its impact on cell localization and counting performance.As shown in Fig. 8, we select DCLNet to experiment on the test set of BCData.The best localization and counting performance is achieved when θ is 0.7.When the θ is 0, the difference convolution degenerates into vanilla convolution, and the model performance significantly decreases.

D. Computational Cost
The number of parameters and the computational cost of a method affect its practical application a lot.Therefore, several mainstream methods are chosen to compare with our DCLNet, including ResNet-101 [36], U-Net [32], HRNet [39], and HoVer-Net [38], and the results are shown in Table IV.We use 512 × 512 resolution images from the CoNIC dataset as input to measure  the GFLOPs of all methods.As indicated by the data in Table IV, DCLNet improves the localization performance on the CoNIC dataset by 5.4%, 4.5%, 1.6%, and 1%, respectively.Meanwhile, the GFLOPs of DCLNet is 23%, 1.2%, and 0.5% higher than ResNet-101, U-Net, and HRNet.In addition, the GFLOPs of DCLNet is 77% lower than HoVer-Net.Based on the above analysis, we can conclude that the performance improvement ratio brought by DCLNet is greater than the increase in GFLOPs.

E. Impact of PSI 1) Advancement of PSI:
As mentioned in Section III-B1, we first perform the pre-experiment on the nuclei grading dataset [8], and the results are listed in Table V.As can be seen from Table V, under the same model, the PSI map outperforms the density map and point map in both localization and counting performance, proving that scale information can bring performance improvement to the cell localization task.
To further validate the advantage of the PSI map, experiments are conducted on existing cell localization datasets with point annotations, i.e., BCData.Huang et al. [3] use a strategy of separately predicting positive and negative cells in their experiments.This strategy of separately predicting negative and positive cells avoids the problem of large color variations in cells, but it also Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.raises two issues: 1) due to differences in staining techniques and different standards for classifying negative and positive cells in different departments, independently predicting models is greatly restricted in practical applications; 2) when only one type of cell needs predicting during the model training process, other types of cells will interfere with the target cells.Therefore, we predict all cells simultaneously, and the results are shown with an asterisk (*) at the bottom of Table VI.
As shown in Table VI, the performance comparison of the PSI map and density map in separately and uniformly predicting negative and positive cells is listed.The following three points can be obtained.1) When predicting negative and positive cells separately, using PSI maps can improve the localization performance of both negative and positive cells.Compared with the results achieved by U-CSRNet ∝ , using PSI maps increases the average F1 score, accuracy, and recall by 1.3, 2.3, and 0.6, respectively.2) When predicting negative and positive cells uniformly, using PSI maps can further improve the cell localization performance, with F1 score, accuracy, and recall increasing by 1.5, 2.5, and 0.7.3) Using PSI maps can improve the localization performance of cells and greatly increase the accuracy, regardless of whether negative and positive cells are predicted separately or uniformly.
2) Influence of the Scale: As mentioned in Section III, the masks of each cell in the PSI map are independent.To further investigate the influence of the scale of masks on cell localization performance, we conduct an ablation experiment on the radius of each mask in the PSI map.We choose six different radius sizes, with the radius of the mask generated by the method mentioned in Section III used as the reference, denoted as R.Then, the original mask is eroded by 1-3 pixels and dilated by 1-2 pixels.The experiment is conducted on BCData, and the results are shown in Fig. 9.For localization, Unet and HRNet achieve the best F1 and Rec at R-2.For counting, Unet achieves the minimum MAE and RMSE at R-2, while HRNet achieves the optimal performance at R. However, the localization performance of HRNet at R is much lower than that at R-2, so the radius size of all masks applied in this paper is set to R-2.

V. CONCLUSION
In this paper, we addressed two unresolved challenges in cell localization.Our novel gradient-aware and shape-adaptive Difference-Deformable Convolution (DDConv) can extract the edge information of lightly stained cells to overcome the challenge of lightly stained cells.meanwhile, adaptively adjusting the shape of the convolutional kernel to overcome the challenge of the large variability in cell shape.And, for the first time, we Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 1 .
Fig. 1.Existing automatic cell localization methods are still facing two unresolved challenges.Challenge I: The large variability in cell size and shape, coupled with the heterogeneous intensity distribution of lightly stained cells, presents the first unsolved challenge.Challenge II: Unreasonable location map is the second unsolved challenge.

Fig. 3 .
Fig. 3. Illustration of DDConv.First, the gradient information of the image is extracted by central difference convolution.Then, the offset of each sampling point of the convolutional kernel is calculated by the extracted gradient information.Finally, each sampling point of DDConv is convolved with the pixels on the image that have been mapped by the offset.

Fig. 4 .Algorithm 1 :
Fig. 4. (a) Presents the process of scale verification experiment and generating PSI maps using instance segmentation datasets.The results compared with density maps and point maps demonstrate that scale information can improve cell localization performance.(b)Illustrates the process of the scale-giving method, which mainly consists of contour detection and location analysis.This method can be used to generate PSI maps using point annotation datasets and comprehensively incorporate scale information into the cell localization task.

Fig. 5 .
Fig.5.Illustration of DCLNet.The backbone consists of 4 stages, and each stage will add a branch.Different-sized features will fuse at the end of each stage and finally output 4 features.After that, two of the four output feature maps with smaller sizes are fed into the DDConv.Then, all the feature maps are upsampled to the same size and concatenated along the channel dimension.

Fig. 6 .Fig. 7 .
Fig. 6.Some typical visualization results of six popular methods and the proposed DCLNet.The green and blue points denote true positive (TP) and false positive (FP), respectively.The green and red circles are the ground truth of each cell.The red and yellow boxes highlight some representative comparisons.The images of the first, second, and fourth rows originated from BCData, and the images of the third and fifth rows originated from the nuclei grading dataset.

Fig. 8 .
Fig. 8. Impact of different values of θ on cell localization and counting performance.

Fig. 9 .
Fig. 9. Comparison of changing the radius R of the masks in the PSI map on cell localization and counting performance.The experimental results are derived from the BCData dataset.

TABLE I PERFORMANCE
COMPARISON OF MAINSTREAM METHODS ON BCDATA DATASET, NUCLEI GRADING DATASET, AND CONIC DATASET

TABLE II QUANTITATIVE
COMPARISON OF VANILLA CONVOLUTION, DEFORMABLE CONVOLUTION, AND DDCONV ON THE CONIC DATASET TABLE III IMPACT OF THE NUMBER AND LOCATION OF DDCONVS ON CELL LOCALIZATION PERFORMANCE quantitative results in Table II that DDConv achieves the best localization and counting performance on both datasets.This demonstrates that DDConv has better localization performance than vanilla convolution and deformable convolution on images with different staining patterns and cell types.

TABLE IV COMPARISON
OF THE COMPUTATIONAL COST OF SEVERAL REPRESENTATIVE METHODS TABLE V PRE-EXPERIMENTS ARE CONDUCTED ON AN INSTANCE SEGMENTATION DATASET USING THE DENSITY MAP, POINT MAP, AND PSI MAP

TABLE VI COMPARISON
OF LOCALIZATION PERFORMANCE USING DENSITY MAP AND PSI MAP ON BCDATA