Automatic detection of Long Method and God Class code smells through
neural source code embeddings
Abstract
Code smells are structures in code that often have a negative impact on
its quality. Manually detecting code smells is challenging and
researchers proposed many automatic code smell detectors. Most of the
studies propose detectors based on code metrics and heuristics. However,
these studies have several limitations, including evaluating the
detectors using small-scale case studies and an inconsistent
experimental setting. Furthermore, heuristic-based detectors suffer from
limitations that hinder their adoption in practice. Thus, researchers
have recently started experimenting with machine learning (ML) based
code smell detection.
This paper compares the performance of multiple ML-based code smell
detection models against multiple traditionally employed metric-based
heuristics for detection of God Class and Long Method code smells. We
evaluate the effectiveness of different source code representations for
machine learning: traditionally used code metrics and code embeddings
(code2vec, code2seq, and CuBERT).
We perform our experiments on the large-scale, manually labeled MLCQ
dataset. We consider the binary classification problem – we classify
the code samples as smelly or non-smelly and use the F1-measure of the
minority (smell) class as a measure of performance. In our experiments,
the ML classifier trained using CuBERT source code embeddings achieved
the best performance for both God Class (F-measure of 0.53) and Long
Method detection (F-measure of 0.75). With the help of a domain expert,
we perform the error analysis to discuss the advantages of the CuBERT
approach.
This study is the first to evaluate the effectiveness of pre-trained
neural source code embeddings for code smell detection to the best of
our knowledge. A secondary contribution of our study is the systematic
evaluation of the effectiveness of multiple heuristic-based approaches
on the same large-scale, manually labeled MLCQ dataset.