Code classification based on software graph similarities

The Cloud features the High-Performance Computing (HPC) capabilities, offering the application accelerators like Graphical Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), and dedicated services and hardware for the Artificial Intelligence (AI). The availability of different Cloud computing resources at different locations of the network make the deployment of applications in such environments challenging. For example, in the case of Edge nodes, we need to exploit efficiently the edge resources and services to address specific application requirements such as low latency in data processing. To this end, solutions are needed to find the optimal deployment model for the target application.

 

Project goal:

One interesting approach to this problem is to find out what deployment models have been proposed for other similar applications, in particular for popular open-source applications and tools. This requires analysing the source code of the target application and exploring existing open-source applications with similar implementation code. This is the main goal of this master thesis project. One potential approach is to classify open source projects using machine learning (ML) techniques or some statistical approaches. Assuming that the source code for the application is available, an appropriate classifier can be designed to discover similarities between the application architecture parts and the profiles of the open source projects. This will reveal the characteristics of the application, and hopefully identify some deployment options for the application and its data sets. In this master project, the student will focus on the similarity issue from the software graph viewpoint, e.g., data-flow graph of a Java application.  

It should be noted that this thesis will be carried out within the EU project MORPHEMIC [1].  
 

The master project includes the following phases:

  1. Studying the graph-based visualisation techniques for program code [2]. The student can limit the code type to an object-oriented programming model, e.g. Java or C++. This requires gaining some basic knowledge on graph theory first.  
  2. Search for existing web crawlers (e.g., Markos [3]) that index information about open source projects, both on project meta-data and the code.
  3. Exploring the ML-based and statistical techniques suitable for classification of program code and analysing code similarities [4].
  4. Focusing on a sample application (provided by the partners in MORPHEMIC) as a case study and identifying appropriate techniques for classifying the code of the application and finding existing open source applications that are similar to the application code.
  5. Analysing similarity in the above case study based on the identified metrics.

References
[1] https://www.morphemic.cloud/  
[2] Stephan Diehl, “Software Visualization: Visualizing the Structure, Behaviour, and Evolution of Software”, Springer, 2007.   
[3] https://cordis.europa.eu/project/id/317743
[4] Silvio Cesare and Yang Xiang, “Software Similarity and Classification”, Springer, 2012.

Publisert 11. sep. 2020 09:11 - Sist endret 11. sep. 2020 09:19

Omfang (studiepoeng)

60