Rui Fan - TechRxiv

Rui Fan

Public Documents 8

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator

Rui Fan

and 7 more

March 10, 2021

This paper proposes three-filters-to-normal (3F2N), an accurate and ultrafast surface normal estimator (SNE), which is designed for structured range sensor data, e.g., depth/disparity images. 3F2N SNE computes surface normals by simply performing three filtering operations (two image gradient filters in horizontal and vertical directions, respectively, and a mean/median filter) on an inverse depth image or a disparity image. Despite the simplicity of 3F2N SNE, no similar method already exists in the literature. To evaluate the performance of our proposed SNE, we created three large-scale synthetic datasets (easy, medium and hard) using 24 3D mesh models, each of which is used to generate 1800–2500 pairs of depth images (resolution: 480X640 pixels) and the corresponding ground-truth surface normal maps from different views. 3F2N SNE demonstrates the state-of-the-art performance, outperforming all other existing geometry-based SNEs, where the average angular errors with respect to the easy, medium and hard datasets are 1.66 degrees, 5.69 degrees and 15.31 degrees, respectively. Furthermore, our C++ and CUDA implementations achieve a processing speed of over 260 Hz and 21 kHz, respectively. Our datasets and source code are publicly available at sites.google.com/view/3f2n.

Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions:...

Rui Fan

and 14 more

May 12, 2022

Purpose: To compare the diagnostic accuracy and explainability of a new Vision Transformer deep learning technique, Data-efficient image Transformer (DeiT), and Resnet-50, trained on fundus photographs from the Ocular Hypertension Treatment Study (OHTS) to detect primary open-angle glaucoma (POAG) and to identify the salient areas of the photographs most important for each model’s decision-making process. Study Design: Evaluation of a diagnostic technology Subjects, Participants, and/or Controls: 66,715 photographs from 1,636 OHTS participants and an additional five external datasets of 16137 photographs of healthy and glaucoma eyes. Methods, Intervention, or Testing: DeiT models were trained to detect five ground truth OHTS POAG classifications: OHTS Endpoint Committee POAG determinations due to disc changes (Model 1), visual field changes (Model 2), or either disc or visual field changes (Model 3) and reading center determinations based on disc (Model 4) and visual fields (Model 5). The best-performing DeiT models were compared to ResNet-50 on OHTS and five external datasets. Main Outcome Measures: Diagnostic performance was compared using areas under the receiver operating characteristic curve (AUROC) and sensitivities at fixed specificities. The explainability of the DeiT and ResNet-50 models was compared by evaluating the attention maps derived directly from DeiT to 3 gradient-weighted class activation map generation strategies. Results: Compared to our best-performing ResNet-50 models, the DeiT models demonstrated similar performance on the OHTS test sets for all five-ground truth POAG labels; AUROC ranged from 0.82 (Model 5) to 0.91 (Model 1). However, the AUROC of DeiT was consistently higher than ResNet-50 on the five external datasets. For example, AUROC for the main OHTS endpoint (Model 3) was between 0.08 and 0.20 higher in the DeiT compared to ResNet-50 models. The saliency maps from the DeiT highlight localized areas of the neuroretinal rim, suggesting the use of important clinical features for classification, while the same maps in the ResNet-50 models show a more diffuse, generalized distribution around the optic disc, Conclusions: Vision transformer has the potential to improve the generalizability and explainability of deep learning models for the detection of eye disease and possibly other medical conditions that rely on imaging modalities for clinical diagnosis and management.

Detecting Glaucoma in the Ocular Hypertension Study Using Deep Learning

Rui Fan

and 16 more

March 28, 2022

To investigate the diagnostic accuracy of deep learning (DL) algorithms to detect primary open-angle glaucoma (POAG) trained on fundus photographs from the Ocular Hypertension Treatment Study (OHTS). 66,715 photographs from 3,272 eyes were used to train and test a ResNet-50 model to detect the OHTS Endpoint Committee POAG determination based on optic disc (n=287 eyes, 3,502 photographs) and/or visual field (n=198 eyes, 2,300 visual fields) changes. OHTS training, validation and testing sets were randomly determined using an 85-5-10 percentage split by subject. Three independent test sets were used to estimate the generalizability of the model: UCSD Diagnostic Innovations in Glaucoma Study (DIGS, USA), ACRIMA (Spain) and Large-scale Attention-based Glaucoma (LAG, China). The DL model achieved an AUROC (95% CI) of 0.88 (0.82, 0.92) for the overall OHTS POAG endpoint. For the OHTS endpoints based on optic disc changes or visual field changes, AUROCs were 0.91 (0.88, 0.94) and 0.86 (0.76, 0.93), respectively. False-positive rates (at 90% specificity) were higher in photographs of eyes that later developed POAG by disc or visual field (19.1%) compared to eyes that did not develop POAG (7.3%) during their OHTS follow-up. The diagnostic accuracy of the DL model developed on the OHTS optic disc endpoint applied to 3 independent datasets was lower with AUROC ranging from 0.74 to 0.79. High diagnostic accuracy of the current model suggests that DL can be used to automate the determination of POAG for clinical trials and management. In addition, the higher false-positive rate in early photographs of eyes that later developed POAG suggests that DL models detected POAG in some eyes earlier than the OHTS Endpoint Committee.

CoT-AMFlow: Adaptive Modulation Network with Co-Teaching Strategy for Unsupervised Op...

Hengli Wang

and 2 more

December 03, 2020

The interpretation of ego motion and scene change is a fundamental task for mobile robots. Optical flow information can be employed to estimate motion in the surroundings. Recently, unsupervised optical flow estimation has become a research hotspot. However, unsupervised approaches are often easy to be unreliable on partially occluded or texture-less regions. To deal with this problem, we propose CoT-AMFlow in this paper, an unsupervised optical flow estimation approach. In terms of the network architecture, we develop an adaptive modulation network that employs two novel module types, flow modulation modules (FMMs) and cost volume modulation modules (CMMs), to remove outliers in challenging regions. As for the training paradigm, we adopt a co-teaching strategy, where two networks simultaneously teach each other about challenging regions to further improve accuracy. Experimental results on the MPI Sintel, KITTI Flow and Middlebury Flow benchmarks demonstrate that our CoT-AMFlow outperforms all other state-of-the-art unsupervised approaches, while still running in real time. Our project page is available at https://sites.google.com/view/cot-amflow.

SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for...

Rui Fan

and 3 more

September 01, 2020

Freespace detection is an essential component of visual perception for self-driving cars. The recent efforts made in data-fusion convolutional neural networks (CNNs) have significantly improved semantic driving scene segmentation. Freespace can be hypothesized as a ground plane, on which the points have similar surface normals. Hence, in this paper, we first introduce a novel module, named surface normal estimator (SNE), which can infer surface normal information from dense depth/disparity images with high accuracy and efficiency. Furthermore, we propose a data-fusion CNN architecture, referred to as RoadSeg, which can extract and fuse features from both RGB images and the inferred surface normal information for accurate freespace detection. For research purposes, we publish a large-scale synthetic freespace detection dataset, named Ready-to-Drive (R2D) road dataset, collected under different illumination and weather conditions. The experimental results demonstrate that our proposed SNE module can benefit all the state-of-the-art CNNs for freespace detection, and our SNE-RoadSeg achieves the best overall performance among different datasets.

ATG-PVD: Ticketing Parking Violations on A Drone

Hengli Wang

and 10 more

August 21, 2020

In this paper, we introduce a novel suspect-and-investigate framework, which can be easily embedded in a drone for automated parking violation detection (PVD). Our proposed framework consists of: 1) SwiftFlow, an efficient and accurate convolutional neural network (CNN) for unsupervised optical flow estimation; 2) Flow-RCNN, a flow-guided CNN for car detection and classification; and 3) an illegally parked car (IPC) candidate investigation module developed based on visual SLAM. The proposed framework was successfully embedded in a drone from ATG Robotics. The experimental results demonstrate that, firstly, our proposed SwiftFlow outperforms all other state-of-the-art unsupervised optical flow estimation approaches in terms of both speed and accuracy; secondly, IPC candidates can be effectively and efficiently detected by our proposed Flow-RCNN, with a better performance than our baseline network, Faster-RCNN; finally, the actual IPCs can be successfully verified by our investigation module after drone re-localization.

We Learn Better Road Pothole Detection: from Attention Aggregation to Adversarial Dom...

Rui Fan

and 3 more

December 17, 2020

Manual visual inspection, typically performed by certified inspectors, is still the main form of road pothole detection. This process is, however, not only tedious, time-consuming and costly, but also dangerous for the inspectors. Furthermore, the road pothole detection results are always subjective, because they depend entirely on the inspector’s experience. In this paper, we first introduce a disparity (or inverse depth) image processing module, named quasi inverse perspective transformation (QIPT), which can make the damaged road areas become highly distinguishable. Then, we propose a novel attention aggregation (AA) framework, which can improve the semantic segmentation networks for better road pothole detection, by taking the advantages of different types of attention modules. Moreover, we develop a novel training set augmentation technique based on adversarial domain adaptation, where synthetic road RGB images and transformed road disparity (or inverse depth) images are generated to enhance the training of semantic segmentation networks. The experimental results illustrate that, firstly, the disparity (or inverse depth) images transformed by our QIPT module become more informative; secondly, the adversarial domain adaptation can not only significantly improve the performance of the state-of-the-art semantic segmentation networks, but also accelerate their convergence. In addition, AA-UNet and AA-RTFNet, our best performing implementations, respectively outperform all other state-of-the-art single-modal and data-fusion networks for road pothole detection.

Conditional Link Prediction of Category-Implicit Keypoint Detection

Ellen Yi-Ge

and 3 more

November 10, 2020

Keypoints of objects reflect their concise abstractions, while the corresponding connection links (CL) build the skeleton by detecting the intrinsic relations between keypoints. Existing approaches are typically computationally-intensive, inapplicable for instances belonging to multiple classes, and/or infeasible to simultaneously encode connection information. To address the aforementioned issues, we propose an end-to-end category-implicit Keypoint and Link Prediction Network (KLPNet), which is the first approach for simultaneous semantic keypoint detection (for multi-class instances) and CL rejuvenation. In our KLPNet, a novel Conditional Link Prediction Graph is proposed for link prediction among keypoints that are contingent on a predefined category. Furthermore, a Cross-stage Keypoint Localization Module (CKLM) is introduced to explore feature aggregation for coarse-to-fine keypoint localization. Comprehensive experiments conducted on three publicly available benchmarks demonstrate that our KLPNet consistently outperforms all other state-of-the-art approaches. Furthermore, the experimental results of CL prediction also show the effectiveness of our KLPNet with respect to occlusion problems.