Automatic detection of source code similarities using machine learning techniques

Authors

  • Marina Elizabeth Cardenas Grupo de Investigación, Desarrollo y Transferencia en Aprendizaje Automático, Lenguajes y Autómatas- Centro de Investigación y Desarrollo de Software- Facultad Regional Córdoba - Universidad Tecnológica Nacional – Argentina
  • Julio Javier Castillo Director

DOI:

https://doi.org/10.33414/ajea.1069.2022

Keywords:

source code, similarities, reuse, machine learning, text, analysis

Abstract

This paper proposes the development of a model to detect similarities in source code in order to determine the existence of reuse practices using techniques linked to machine learning with a focus on computational linguistics. There are various techniques developed by various authors that allow the detection of similar source code fragments (usually called Code Clones) focused on the different types of clones. The identification of these source code clones can serve several purposes, such as the study of the evolution of the source code of a project, detection of reuse practices, extraction of a code fragment for its refactoring, detection and monitoring of defects, failures and/or viruses for their correction, among others.

Downloads

Download data is not yet available.

Published

2022-10-03

How to Cite

Cardenas, M. E., & Castillo, J. J. (2022). Automatic detection of source code similarities using machine learning techniques. AJEA (Proceedings of UTN Academic Conferences and Events), (15). https://doi.org/10.33414/ajea.1069.2022