Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In computer programming, duplicate code is a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons. A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as code clones or just clones, the automated process of finding duplications in source code is called clone detection. Two code sequences may be duplicates of each other without being character-for-character identical, for example by being character-for-character identical only when white space characters and comments are ignored, or by being token-for-token identical, or token-for-token identical with occasional variation. Even code sequences that are only functionally identical may be considered duplicate code. Some of the ways in which duplicate code may be created are: copy and paste programming, which in academic settings may be done as part of plagiarism scrounging, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code, such as renaming variables or inserting/deleting code. The language nearly always allows one to call one copy of the code from different places, so that it can serve multiple purposes, but instead the programmer creates another copy, perhaps because they do not understand the language properly do not have the time to do it properly, or do not care about the increased active software rot. It may also happen that functionality is required that is very similar to that in another part of a program, and a developer independently writes code that is very similar to what exists elsewhere. Studies suggest that such independently rewritten code is typically not syntactically similar.
George Candea, Rishabh Ramesh Iyer
Denis Gillet, Maria Jesus Rodriguez Triana, Juan Carlos Farah, Sandy Ingram