Automatic detection of source code similarities using machine learning techniques

Authors

  • Marina Elizabeth Cardenas Grupo de Investigación, Desarrollo y Transferencia en Aprendizaje Automático, Lenguajes y Autómatas- Centro de Investigación y Desarrollo de Software- Facultad Regional Córdoba - Universidad Tecnológica Nacional – Argentina
  • Julio Javier Castillo Director

DOI:

https://doi.org/10.33414/ajea.1069.2022

Keywords:

source code, similarities, reuse, machine learning, text, analysis

Abstract

This paper proposes the development of a model to detect similarities in source code in order to determine the existence of reuse practices using techniques linked to machine learning with a focus on computational linguistics. There are various techniques developed by various authors that allow the detection of similar source code fragments (usually called Code Clones) focused on the different types of clones. The identification of these source code clones can serve several purposes, such as the study of the evolution of the source code of a project, detection of reuse practices, extraction of a code fragment for its refactoring, detection and monitoring of defects, failures and/or viruses for their correction, among others.

Downloads

Metrics

Visualizaciones del PDF
142
Oct 04 '22Oct 07 '22Oct 10 '22Oct 13 '22Oct 16 '22Oct 19 '22Oct 22 '22Oct 25 '22Oct 28 '22Oct 31 '22Nov 01 '222.0
| |

Published

2022-10-03

How to Cite

Cardenas, M. E., & Castillo, J. J. (2022). Automatic detection of source code similarities using machine learning techniques. AJEA (Proceedings of UTN Academic Conferences and Events), (15). https://doi.org/10.33414/ajea.1069.2022