Automatic detection of source code similarities using machine learning techniques
DOI:
https://doi.org/10.33414/ajea.5.745.2020Keywords:
source code, similarities, reuse, machine learning, text, analysisAbstract
This thesis proposal proposes the development of a model for detection of source code similarities in order to determine the existence of reuse practices applying techniques related to computational linguistics, such as text data mining and natural language processing. The identification of code similarities have several aims, including the study of the evolution of the source code of a project, detection of reuse practices, extraction of a code fragment for “refactoring” of the project, monitoring of defects for correction, among others.