QUT - Interpretable software vulnerability detection using deep learning techniques

Study levels

PhD
Master of Philosophy
Honours

Faculty/School

Faculty of Science

School of Computer Science

Topic status

We're looking for students to study this topic.

Research centre

Centre for Data Science

Supervisors

Associate Professor Yue Xu

Position: Associate Professor
Division / Faculty: Faculty of Science

Overview

Software vulnerabilities have been considered as significant reliability threats to the general public, especially critical infrastructures. Many approaches have been proposed to detect vulnerabilities in source code to avoid any damages they pose when exploited. Conventional approaches include static analysis and dynamic analysis. Static analysis uses pre-defined patterns or vulnerability dataset to scan and examine software source code to identify potential vulnerable code snippets. These patterns are manually crafted or identified by software developers or security experts, which are time-consuming. Pattern-based techniques have high false-positive rates and often fail to detect complex vulnerabilities. Dynamic analysis is to run the target source code with pre-defined test cases. This technique can eliminate the false positives but can easily miss true vulnerabilities and extremely computationally expensive. Recently, machine learning techniques, especially deep learning techniques, have been developed for source code vulnerability discovery, which can significantly improve the detection accuracy. One important reason is because deep learning models have the capability to discover latent features representing the meaning of the code that human experts may never be able to define. However, the output of existing deep learning models is a binary decision on whether the given code is vulnerable or not, without details on what part of the code is relevant to the detected vulnerability. A deep learning model is often considered as a 'black-box'. The output of the model is the result of a series of latent transformations that are applied to the input source code, which makes it impossible to explain the output.

In this project, we aim to develop new deep learning-based methods to detect vulnerabilities in source code and also provide users with fine-grained interpretations at nodes, statements, or the sub-graphs that are relevant to the detected vulnerability.

Research activities

The main activities include:

conduct an investigation work to evaluate existing deep learning-based vulnerability detection models in terms of interpretability
develop methods to generate representations to represent code at different levels including levels of tokens, statements, paths of code graphs and code snippets
develop methods to detect software vulnerabilities based on the multi-level code representations
develop methods to generate user-understandable explanations for the detected vulnerable code.

Outcomes

Upon conclusion of this research project, we expect:

improved models or algorithms to generate code embeddings at multiple levels for representing source code semantically
models to detect vulnerable code
models to provide explanations to the detected vulnerable code.

Skills and experience

To be considered for this project, we expect you to have:

knowledge of data mining and machine learning
knowledge of databases
good programming skills (preferably Python).

Scholarships

You may be eligible to apply for a research scholarship.

Explore our research scholarships

Contact

Contact the supervisor for more information.