Automatic detection of Feature Envy and Data Class code smells using machine learning.pdf (564.71 kB)
Download file

Automatic detection of Feature Envy and Data Class code smells using machine learning

Download (564.71 kB)
posted on 2023-03-27, 03:57 authored by Milica Škipina, Jelena SlivkaJelena Slivka, Nikola LuburićNikola Luburić, Aleksandar Kovačević

 A code smell is a surface indication that usually corresponds to a deeper problem in the system. Detecting and removing code smells is crucial for sustainable software development. However, manual detection can be daunting and time-consuming. Machine learning (ML) is a promising approach towards the automation of code smell detection. The first ML-based methods were classifiers trained on feature vectors comprising software metrics extracted by off-the-shelf tools. Determining the optimal set of metrics is a complex problem that requires both ML and software engineering expertise. Recently source code embedding models emerged as a viable feature-inferring alternative. However, their potential is yet to be fully explored. To that aim, we compare state-of-the-art source code embedding models (CuBERT and CodeT5) with the models trained on metrics returned by the CK Tool and RepositoryMiner tools. We focus on detecting the Data Class and Feature Envy code smells within a large-scale, manually labeled, publicly available dataset. After extensive experiments (51 test/train splits), we found that source code embedding models have comparable performances with software metrics, a that they indeed can capture important characteristics of the source code. We discuss our findings in detail in the paper.


Science Fund of the Republic of Serbia, Grant No 6521051, AI-Clean CaDET

Ministry of Science, Technological Development and Innovation through project no. 451-03-47/2023-01/200156 “Innovative scientific and artistic research from the FTS (activity) domain”


Email Address of Submitting Author

ORCID of Submitting Author


Submitting Author's Institution

University of Novi Sad, Faculty of Technical Sciences

Submitting Author's Country

  • Serbia