Fast Ocean Front Detection Using Deep Learning Edge Detection Models

Small-scale ocean fronts play a significant role in absorbing the excess heat and CO2 generated by climate change, yet their dynamics are not well understood. The existing in situ and remote sensing measurements of the ocean have inadequate spatial and temporal coverage to map small-scale ocean fronts globally. In addition, conventional algorithms to generate ocean front maps are computationally intensive and require data with long lead times. We propose machine learning (ML) models to detect temperature and chlorophyll ocean fronts from unprocessed and radiometrically uncorrected satellite imagery by transfer learning from the existing models for edge detection. We use two separate datasets: one based on conventional approaches to ocean front detection, and a second based on human-annotated ground truth. The deep learning front detection approach significantly reduces the resources and overall lead times needed for detecting ocean fronts. The deep learning models are developed with resource-constrained edge compute platforms, such as CubeSats in mind, as such platforms can address the spatial and temporal coverage challenges. The highest performing models achieve the accuracies of 96% and make predictions in milliseconds using unoptimized desktop CPUs and less than 100 MB of storage; these capabilities are well suited for CubeSat deployment.


I. INTRODUCTION
Ocean fronts are boundaries with strong gradients in the water properties, such as temperature, salinity, nutrients, and biological content.They are formed primarily due to the mixing of bodies of water and exist in a wide range of spatial and temporal scales.For example, the smallest ocean fronts can be meters long and last for days, whereas the largest fronts span thousands of kilometers and last for millions of years [1].
Ocean fronts play a vital role in the ecology of marine life, and changing ocean fronts are important indicators of climate change.The ocean absorbs much of the excess heat and carbon dioxide generated by fossil fuel emissions.The exact dynamics of this heat and carbon dioxide emission are yet to be fully understood but may be influenced by small-scale ocean fronts [2].Recent ocean front tracking indicates irregular and erratic activity [3,4], potentially caused by climate change, motivating the use of improved lower latency monitoring [5].
While ocean fronts can be extracted from a wide variety of properties, the two most commonly tracked types of ocean fronts are sea surface temperature (SST) and chlorophyll-a concentration (CHL) fronts.Traditional insitu methods use buoys, gliders, and science cruises to track these properties.Fronts are hand-drawn over extracted data in areas of sharp gradients [1].
Remote sensing has allowed for wide-spread monitoring of ocean fronts.For such large areas, ocean fronts are no longer hand-drawn, but instead, algorithms such as the Cayula-Cornillon algorithm (CCA) or the Belkin-O-Reilly algorithm (BOA) are used [6,7].These algorithms are computationally intensive and can rely on radiometrically calibrated data from multiple sources.In this work, we propose re-purposing deep learning edge detection models to detect ocean fronts directly on L1 radiometrically uncalibrated data, allowing for fast, lowlatency monitoring of ocean fronts.
Recent developments in commercial electronics components have increased CubeSats' processing power and efficiency to the point where complex computations such as those required by some algorithms and even deep learning models can be completed on-orbit.For example, Nvidia and Xilinx are making parallel processing available in embedded forms that can be used onboard small satellites like CubeSats [8].Fast ocean front detection can then be performed directly on edge devices for global ocean front monitoring.The concept of operations for this involves processing the data on-orbit and using it as a kind of always-on monitoring system, only selecting the most valuable data for downlink.A diagram of this concept is shown in Fig. 1.This work is targeted towards MIT's BeaverCube-2 mission [9,10], but the concepts and models can apply to any Earth-observing satellite with onboard compute capability.
Remote sensing data and algorithms have significantly lowered barriers to tracking ocean fronts.However, there are still issues in the spatial and temporal coverage of ocean fronts, especially small-scale ocean fronts.Large satellite missions often have poor spatial resolution for detecting small-scale ocean fronts or poor temporal resolution because of long revisit times [11].Many satellites do not spend resources on ocean imaging, instead focusing on land mass imaging.Additionally, the time between when a satellite image is taken and when the ocean fronts in the image are identified can be weeks due to many intermediate processing levels and incorporating data from other sources [12].This lag prevents real-time tracking of ocean fronts, important for fishermen and scientists alike.

II. BACKGROUND A. CHL and SST Retrieval from Remote Sensing
Remote sensing offers a way to track ocean fronts globally.There are multiple active satellite missions that can be used to track both CHL and SST fronts, as shown in Table I.Primarily these missions have bands around 440 nm to measure CHL and thermal infrared bands between 8-14 µm to measure SST.This work utilizes Landsat 8 data and focuses on ocean fronts in coastal waters.
NASA uses established algorithms to calculate chlorophyll-a concentration and sea surface temperature from surface reflectance data (Landsat Level 2 data) [13][14][15].A different set of coefficients is used for each satellite instrument.CHL concentration is retrieved using a fourth-order polynomial relationship between Landsat Bands 1 (Coastal Aerosol), 2 (Blue), 3 (Green), given as: where a i are the coefficients, and λ is surface reflectance, returning the near-surface concentration of chlorophyll-a in units of mg/m 3 , calculated using relationships from in-situ chlorophyll-a measurements and corresponding satellite imagery.λ blue represents max(Band 1, Band 2) while λ green represents Band 3. a 0 through a 4 represent instrument-specific coefficients [13,14].
SST is computed using a linear relationship between Landsat Band 10 (TIRS 1), given as where a 0 and a 1 represent instrument-specific coefficients, and λ TIRS represents Band 10 surface reflectance.This algorithm returns the skin sea surface temperature in units of • C, calculated using an empirical relationship derived from in-situ surface temperature measurements and corresponding satellite imagery [15].

B. Classical Ocean Front Detection
Classical ocean front detection is conventionally done on L2 surface reflectance data, which has already been radiometrically calibrated and atmospherically corrected.Two main algorithms exist for detecting ocean fronts in satellite data: the Cayula-Cornillon Algorithm (CCA) and the Belkin O'Reilly Algorithm (BOA).We utilize BOA in our work due to its applicability to small and meso-scale SST and CHL ocean fronts, and its usage of absolute rather than relative threshold for front intensity.
The Cayula-Cornillon Algorithm [7]    BOA utilizes a recursive, shape-preserving, contextual median filter that removes noise while keeping CHL features.The filter operates on a 3 km × 3 km window while considering a larger 5 km × 5 km context, analyzing 1D slices of the window and context to determine whether or not to filter the window's central pixel [6].This recursive median filter is applied until convergence.Then, Sobel edge detection is applied, and the gradient magnitude is used to calculate pixel-wise ocean front intensity [19,6].
BOA is computationally intensive.The number of iterations until convergence is O(N ), where N is the number of pixels in the image [6].Experimentally, it takes an average of 22 seconds to process a 224 × 224 pixel image with BOA using an nVidia Tesla T4 GPU on Google Colab.
Fig. 2 shows the overall pipeline going from L0 data to detection of CHL and SST fronts using the Belkin-O'Reilly algorithm.Data is pre-processed first by radiometrically calibrating it and deriving reflectances bringing it up to L2, after which the front detection algorithms can be run to produce a mask showing ocean front locations.

C. Deep Learning Edge Detection
Deep learning provides methods of performing edge detection with more fine tuning ability than classical computer vision methods.Transfer learning from other edge detection datasets to ocean front can allow for increased overall accuracy and a smaller required ocean front dataset [20].
The Berkeley Segmentation Dataset 500 (BSDS500) is a popular dataset for training contour/edge detection models.The dataset is composed of 500 images of natural scenes and their corresponding human-annotated edge maps.Each image is annotated by at least three humans, and edge maps are binary images of "edge" and "not edge" pixels [21].BSDS500 is a benchmark dataset for edge detection models, and models are compared on this dataset using F 1 scores.
This work utilizes Holistically-Nested Edge Detection (HED) [23] and Convolutional Encoder-Decoder Network (CEDN) [24] architectures, which are both types of fully convolutional networks that provide a good balance of computational complexity and performance.The architectures of both models are shown in Fig. 4 and Fig. 5, and both utilize a Visual Geometry Group 16 (VGG-16) backbone [25].Both models achieve an F 1 score of 0.79 on the BSD500 dataset, achieving approximately the same performance as humans on the same dataset (0.80) [21].
Fig. 3 provides three examples of images from the BSDS500 dataset and their human annotated ground truths, as well as corresponding HED and CEDN model predictions.

D. Space Considerations
Space hardware is generally benchmarked for low size, weight, and power (SWaP), sometimes extending this metric to cost as well (SWaP-C).Real-time software has similar analogs, where it may be preferable for algorithms to balance resource usage and overall accuracy.
The space environment imposes additional constraints as computational complexity is further limited by heat dissipation.Communications limits are also a consideration, as binary sizes are constrained by the uplink to the satellite.In practice, achieving all of these goals at once is difficult, and methods may excel at all metrics except one [10].

III. APPROACH
We propose replacing every image processing step between Level 0 Landsat data and BOA-detected CHL and SST fronts as shown in Fig. 2 with a deep learning model as shown in Fig. 6.This model will implicitly learn the atmospheric corrections needed to go from Level 0 data to Level 2 data, the NASA algorithms needed to go from Level 2 data to CHL and SST, and the Belkin O'Reilly Algorithm needed to detect ocean fronts.Fig. 6 shows our proposed image processing pipeline.
We structure our machine learning task as a binary classification task-we will assign each pixel a label of "ocean front" or "not ocean front."In order to transfer learn from our existing state-of-the-art (SOTA) edge detection models, we must create a set of training data with annotated ocean fronts.

A. Creating Training Data
To create a training dataset, we first download 255 cloudfree Landsat 8 scenes using Google Earth Engine.These Fig. 5: CEDN network architecture [24].scenes are temporally and spatially variant, capturing various frontal structures such as large-scale water mass fronts, meso-scale fronts around eddies, and local chlorophyll blooms and ridges.The locations of these scenes are detailed in Fig. 7 and all were captured between 2017 and 2021.These scenes consist largely of coastal regions, as those are the predominant oceanic regions imaged by Landsat 8.The specific Landsat scenes used in this dataset are detailed in [5].
To create ground truth ocean front data, we process Landsat Level 2 data using the pipeline detailed in Section II-B.We threshold the Belkin-O'Reilly output at 0.1 • C/km for SST ocean fronts and 0.05 mg/m 3 /km for CHL ocean fronts.There are no defined absolute thresholds for what constitutes an ocean front, so these thresholds are developed with the input of oceanographer Dr Yackar Mauzole [26].
In parallel, we also process Landsat Level 2 data using the NASA CHL and SST algorithms, apply the Belkin-O'Reilly algorithm, then use a human annotator to label the ocean fronts.This method of creating ground truth data aligns with traditional ML methods of creating ground truth edge detection data1 .

B. Model Architectures
The HED and CEDN model architectures detailed in Fig. 4 and Fig. 5 are SOTA models for edge detection on natural images.However, our on-orbit ocean front detection task is narrower in scope and more SWaPconstrained than generalized edge detection.Because edge detection generally happens in the first few layers of a computer vision model, it is feasible that decreasing the number of layers in our models could preserve the edge detection capabilities of the models.We also experiment with two "small" model architectures based on HED and CEDN in Fig. 8 and Fig. 9, decreasing the number of model parameters over eight-fold.

C. Training Parameters
To train our machine learning models, we use focal cross entropy loss [27].This loss function weights examples based on class imbalance, and thus is ideal for our dataset where only 6% of pixels are labeled as ocean fronts.
To prepare our dataset for training, we augment by mirroring and rotation, and subtract the per-channel mean pixel values calculated on the training dataset.
We build our four ML models in Tensorflow, and apply pre-trained VGG weights to the relevant backbone layers.We train our models on the MIT SuperCloud for 100 epochs with a learning rate of 1e-4 [28].
Fig. 8: A smaller version of the HED network architecture as compared to Fig. 4.
Note that because we are using pre-trained VGG weights, we only have three input channels (traditionally R, G, and B).We use the coastal aerosol, green, and TIRS channels as model inputs, as those are the main bands used in traditional methods of detecting ocean fronts [13,15].

IV. RESULTS
We analyze the performance of our four model architectures across two ground truth datasets using qualitative and quantitative metrics.Qualitatively, we visually compare model outputs to ground truth data.Quantitatively, we calculate binary metrics, F 1 scores, and a precisionrecall curve for each model.Section A describes model performance on the BOA ground truth dataset, while Section B describes model performance on the human annotated ground truth dataset.Section C compares the quality of SST and CHL front predictions, and Section D delves into model resource utilization.
A. BOA Ground Truth 1) Qualitative Comparison: Fig. 10 provides two examples of model input/output.Note that Small HED captures the mild striping caused by inaccurate calibration of the Landsat pushbroom sensors [18], even though it is not present in the ground truth data.CEDN is blurrier than the other outputs, failing to delineate between separate fronts in some predictions.This blurriness is due to the small size of the center convolutional layer in the encoder-decoder architecture, and could potentially be mitigated by adding skip connections across the encoding and decoding layers [24].
Fig. 11 provides an additional example of model input/output, and shows that the BOA ground truth data is not always perfect.In this figure, the ground truth only captures part of the swirls evident in the input data, while the model outputs capture all of the swirls.
This emphasizes the importance of our pretrained VGG weights; the models already have some robust concept of edge detection that only needs to be refined for ocean front detection.The disparity between ground truth data and model predictions-where the models seem to outperform the ground truth-underscores the generalization capabilities of the models.
2) Quantitative Metrics for CHL: The precision-recall curve for CHL is shown in Fig. 12 and describes model performance across all thresholds.The best threshold for each model and the corresponding F 1 score and binary metrics are detailed in Table III.HED performs best across the board, especially in the category of precision.
3) Quantitative Metrics for SST: The precision-recall curve for SST is shown in Fig. 13 and describes model performance across all thresholds.The best threshold for each model and the corresponding F 1 score and binary metrics are detailed in Table IV.HED again performs well across the board, but is less dominant in SST front detection than in CHL front detection.

B. Human Annotated Ground Truth
1) Qualitative Comparison: Next, we train our four models on the human annotated ground truth dataset.Fig. 14 provides two examples of model input/output.Small HED continues to capture the mild striping caused by inaccurate Landsat sensor calibration [18], but the striping is also occasionally visible in the Small CEDN and HED outputs.CEDN SST front predictions occasionally "bleed" into CHL front predictions, such as the SST front in the bottom right corner of the first prediction.This "bleeding" could be mitigated by adding skip connections, as discussed in Section A.
Our human annotated ground truth data is still not perfect.The intent of creating human annotated ground truth data was to produce images with long, smooth ocean fronts that can be captured by the human eye but not by BOA.While the human annotated ground truth fronts are longer and smoother than the BOA ground truth fronts, there is still room for improvement.See Fig. 15, where there is a long, smooth-ish ocean front in the bottom right corner of the TIRS input and all model predictions, but a series of short, jagged fronts in the human annotated ground truth.This problem   could be mitigated by increasing the number of human annotators per image from n=1 to n=3 (the number of annotators per image in the BSDS500 dataset) and averaging the annotations [21].
2) Quantitative Metrics for CHL: The precision-recall curve for CHL is shown in Fig. 16 and describes model performance across all thresholds.The best threshold for each model and the corresponding F 1 score and binary metrics are detailed in Table V.Small HED performs well across the board, as does Small CEDN.
3) Quantitative Metrics for SST: The precision-recall curve for SST is shown in Fig. 17 and describes model performance across all thresholds.The best threshold for each model and the corresponding F 1 score and binary metrics are detailed in Table VI.HED and Small CEDN are the highest performers.
With this final set of metrics, we can draw some general conclusions about the comparative performance of our four model architectures across our two testing datasets: • Models are generally better at capturing SST fronts than CHL fronts (discussed further in Section C). • Models are generally better at producing predictions that align with BOA than predictions that align with human annotation (human annotation as approached in this paper-there may be better ways of doing human annotation as discussed in [21]. • The models that capture intermediate levels of frontal detail (HED, Small CEDN) are the most robust performers quantitatively across all predictions.

C. Comparing CHL and SST Predictions
Overall, the models are quantitatively better at detecting SST fronts than CHL fronts.There are a couple of reasons why this could be the case: 1) In our ground truth datasets where only 6% of pixels are ocean fronts, only 16% of these ocean front pixels are CHL ocean fronts.This could be because our CHL ocean front threshold value is too high, or just reflect the fact that there are fewer CHL fronts than SST fronts in the oceanic regions our training data comes from.Either way, the models have very few examples of what constitutes a CHL ocean front, so they do not learn how to detect them very well.

2) The existing processing pipeline from input
Coastal Aerosol/Green data to output CHL fronts is more complex than the existing processing     pipeline from input TIRS data to output SST fronts.For example, Landsat Level 2 processing includes correcting for the scattering and absorbing effects of aerosols, which affect the Coastal Aerosol band more than the TIRS band [12].Furthermore, the NASA algorithm for computing chlorophyll concentration uses a fourth-order polynomial and three Landsat bands [13], more elaborate than the NASA algorithm for computing sea surface temperature which uses a linear equation and one Landsat band [11].This additional complexity could make it harder for the models to learn to detect CHL ocean fronts.3) There is more developed literature around SST measurements and SST fronts than CHL measurements and CHL fronts [1,29], so the processes surrounding front detection could work better for SST than CHL.While the unique features of CHL fronts are specifically targeted by the contextual filter in BOA [6], there could be other parts of the processing pipeline-such as the satellite instruments-that are better optimized for SST [15].
It is likely a combination of all of these factors that cause the discrepancy between SST front prediction quality and CHL front prediction quality.

D. Resource Utilization
We also analyze the performance of our four model architectures by computing the speed of inference, as well as the storage requirements of inference.The time and memory requirements of our ocean front detection models are a good indicator that these models could be deployed on a CubeSat without exceeding resource constraints.1) Speed of Inference: We calculate the speed of inference for each model using Google Colab.Google Colab offers CPUs (Intel(R) Xeon(R) @ 2.30 GHz), GPUs (nVidia Tesla T4), and TPUs (TPUv4) at runtime [30].Models are benchmarked with no additional runtime optimizations to create a fair comparison.In practice, optimizations such as quantization or mixed precision would be used to speed up inference.
The inference speeds for each model are shown in Table  VII.These times represent the average time for one inference, computed over 2000 inferences.Note that GPU speeds are on average 30 times faster than CPU speeds, while TPU speeds are about the same as CPU speeds.The strong performance of GPU makes sense, as GPUs are better suited for ML model inference.
Small CEDN performs the fastest inferences, followed by Small HED, then CEDN, then HED.This roughly matches the number of parameters in each model, but the HED and Small HED models perform slower than expected.This could be explained by the concatenation layer in the HED and Small HED architectures, a non-standard ML model layer for which the processing units may not be optimized [30].
2) Storage Requirements: We calculate the storage requirements for each model by converting our TensorFlow models to TensorFlow Lite models.TensorFlow Lite models are quantized TensorFlow models, optimized for efficient inference in SWaP-constrained embedded systems [31].
The disk space required for storing models and performing inference with models is detailed in Table VIII.The memory needed for storage corresponds to when the model is not performing inference, while the memory needed for inference corresponds to when the model is performing inference.
The memory required for storage correlates directly with the number of parameters in each model, but the memory required for inference is more interesting.Small CEDN requires the least space, followed by Small HED, then HED and CEDN.This could perhaps also be explained by the non-standard concatenation layer in HED and Small HED, for which TensorFlow Lite may not be optimized [31].

V. CONCLUSIONS & FUTURE WORK
We present machine learning models for detecting temperature and chlorophyll ocean fronts from satellite imagery.These models can be deployed on CubeSats to help address the existing spatial, temporal, and computational challenges of detecting ocean fronts.
We discuss the importance of detecting ocean fronts and emerging tools for doing so more effectively.We examine the traditional non-ML ocean front detection pipeline, and use this existing pipeline to create a set of training data.We explore existing ML models and datasets for edge detection, adapting architectures and techniques to fit our task of ocean front detection.We train our models and analyze them across a broad range of performance and resource metrics to determine the best fit for CubeSat deployment.
The HED and Small CEDN models achieve accuracies of at least 96% for detecting CHL and SST fronts on both BOA data and human annotated data.The ML models Small HED and Small CEDN use the least resources: utilizing less than 120 MB to make predictions in less than 0.002 seconds on a Google Colab GPU.This inference speed contrasts sharply with the traditional pipeline speed-which takes up to 16 days to process imagery from Level 0 to Level 2 [12] and 22 seconds to detect ocean fronts using BOA on a Google Colab GPU.
Overall, Small CEDN seems to have the most promise for CubeSat deployment.The mini-encoder-decoder architecture makes fairly accurate predictions while consuming relatively few resources, a good fit for a SWaPconstrained CubeSat.
Detecting ocean fronts on-orbit is a means to an end; a way to queue images for downlink.These relatively simple on-orbit ML models serve as an initial demonstration and pave the way for future, more complex, on-orbit ML work.

A. Future Work
Additional avenues for future work in this area include analyzing the training data to understand the spatial and temporal relationships between CHL and SST ocean fronts, and fine-tuning the trained models on-orbit.

1) Relationship between CHL and SST Fronts:
There is future work that must be done to fully characterize the spatial and temporal relationship between CHL and SST fronts.There has been little work done in this area before, because Earth-observing satellites often do not observe the bands necessary to generate high-quality CHL and SST data at the same spatial and temporal resolutions [17,32].
The training dataset developed in this work provides an opportunity to more fully explore the relationship between CHL and SST fronts, including quantifying the CHL/SST offset in different oceanic regions during different seasons.
2) On-Orbit Fine-Tuning: Our models are trained with Landsat 8 data, and intended for deployment on a Cube-Sat.If the CubeSat cameras are significantly different from the Landsat 8 cameras, the models could perform worse than expected [18,33].We expect this will not be a large concern because our models depend on imagewide gradients, instead of individual pixel values.Additionally, previous work has shown that models trained on Landsat transfer well to on-orbit data but can require additional white balancing and calibration [10].
However, if the shift from Landsat to CubeSat does present a problem, we could explore on-orbit fine-tuning.This process would involve computing approximate ground truths on-orbit (likely using the Sobel operator [19]) and fine-tuning the models on-orbit.The resources needed for on-orbit training are considerably greater than the resources needed for on-orbit inference, so this finetuning would have to be resource-conscious.

Fig. 2 :
Fig. 2: The traditional image processing pipeline for detecting ocean fronts.The inputs are unprocessed Thermal Infrared (TIRS), Green, and Coastal Aerosol bands.The intermediate processing steps include Level 1 and Level 2 preprocessing, CHL and SST calculations, and the Belkin O'Reilly Algorithm (BOA).The outputs are CHL and SST fronts.This image is constructed with a Landsat training scene [5].

Fig. 6 :Fig. 7 :
Fig. 6: The proposed image processing pipeline for detecting ocean fronts, utilizing an ML model.The inputs are unprocessed Thermal Infrared (TIRS), Green, and Coastal Aerosol bands.The intermediate circles represent neurons in a convolutional neural network (CNN).The outputs are CHL and SST fronts.This image was constructed with a Landsat training scene [5].

Fig. 9 :
Fig. 9: A smaller version of the CEDN network architecture as compared to Fig. 5.

Fig. 10 :
Fig. 10: Sample HED, Small HED, CEDN, and Small CEDN model predictions on images from the BOA ground truth dataset.The far left column is the Level 0 input data.Note the Green channel (not shown) is also a model input, but is excluded from this diagram for simplicity (it is visually similar to the Coastal Aerosol channel).The next column is the thresholded BOA ground truth data.The final four columns are unthresholded outputs from our four models: HED, Small HED, CEDN, and Small CEDN.

Fig. 11 :
Fig. 11: Sample HED, Small HED, CEDN, and Small CEDN model predictions on an image with a poor BOA ground truth.The high-quality model predictions underscore ML's robustness to low-quality ground truth.

Fig. 16 :
Fig. 16: CHL precision-recall curve for models trained with human annotated ground truth data.

Fig. 17 :
Fig. 17: SST precision-recall curve for models trained with human annotated ground truth data.
[9,10]t of Operations for a generic Earth observing CubeSat with edge compute capability.The system is able to extract features from images on-orbit[9,10]. GPSFig.1:

TABLE III :
CHL metrics for models trained with BOA ground truth data.The bolding highlights the highest score for each metric.

TABLE IV :
SST metrics for models trained with BOA ground truth data.The bolding highlights the highest score for each metric.

TABLE V :
CHL metrics for models trained with human annotated ground truth data.The bolding highlights the highest score for each metric.

TABLE VI :
SST metrics for models trained with human annotated ground truth data.The bolding highlights the highest score for each metric.

TABLE VII :
Model inference speeds using CPU, GPU, and TPU backends on Google Colab.

TABLE VIII :
Storage metrics for TensorFlow Lite models.