Automatic detection of source code similarities using machine learning techniques
DOI:
https://doi.org/10.33414/ajea.1069.2022Keywords:
source code, similarities, reuse, machine learning, text, analysisAbstract
This paper proposes the development of a model to detect similarities in source code in order to determine the existence of reuse practices using techniques linked to machine learning with a focus on computational linguistics. There are various techniques developed by various authors that allow the detection of similar source code fragments (usually called Code Clones) focused on the different types of clones. The identification of these source code clones can serve several purposes, such as the study of the evolution of the source code of a project, detection of reuse practices, extraction of a code fragment for its refactoring, detection and monitoring of defects, failures and/or viruses for their correction, among others.