Incremental clone detection for IDEs using dynamic suffix arrays

CCDetect-LSP: Detecting duplicate code incrementally in IDEs

Author: Jakob Konrad Hansen (Github profile)
Supervisor: Volker Stolz, Lars Tveito

Duplicate code is present in most software today. The presence of duplicate code can negatively affect the maintainability and extensibility of software, as modifying duplicated code can often lead to modifying every instance of the duplication. In this thesis we present CCDetect-LSP, a duplicate code detection tool which targets the IDE scenario and is both programming language- and IDE agnostic. Duplicate code detection can be expensive in terms of time and memory, which means most detection algorithms cannot be continuously run in a live environment, such as while editing code in an IDE. In order to facilitate live detection of duplicate code in a code base where small incremental edits are applied, our tool implements a novel detection algorithm which supports fast incremental updates. Our empirical results demonstrate that our incremental algorithm scales better than other non-incremental and incremental algorithms, when small edits are applied to a code base.

Tags: software engineering, clone detection, code smells
Published June 1, 2023 3:36 PM - Last modified June 1, 2023 3:36 PM