Supervisors
- Position
- Associate Professor
- Division / Faculty
- Faculty of Science
Overview
Software vulnerabilities have been considered as significant reliability threats to the general public, especially critical infrastructures. Many approaches have been proposed to detect vulnerabilities in source code to avoid any damages they pose when exploited. Conventional approaches include static analysis and dynamic analysis. Static analysis uses pre-defined patterns or vulnerability dataset to scan and examine software source code to identify potential vulnerable code snippets. These patterns are manually crafted or identified by software developers or security experts, which are time-consuming. Pattern-based techniques have high false-positive rates and often fail to detect complex vulnerabilities. Dynamic analysis is to run the target source code with pre-defined test cases. This technique can eliminate the false positives but can easily miss true vulnerabilities and extremely computationally expensive. Recently, machine learning techniques, especially deep learning techniques, have been developed for source code vulnerability discovery, which can significantly improve the detection accuracy. One important reason is because deep learning models have the capability to discover latent features representing the meaning of the code that human experts may never be able to define. However, the output of existing deep learning models is a binary decision on whether the given code is vulnerable or not, without details on what part of the code is relevant to the detected vulnerability. A deep learning model is often considered as a 'black-box'. The output of the model is the result of a series of latent transformations that are applied to the input source code, which makes it impossible to explain the output.
In this project, we aim to develop new deep learning-based methods to detect vulnerabilities in source code and also provide users with fine-grained interpretations at nodes, statements, or the sub-graphs that are relevant to the detected vulnerability.
Research activities
The main activities include:
- conduct an investigation work to evaluate existing deep learning-based vulnerability detection models in terms of interpretability
- develop methods to generate representations to represent code at different levels including levels of tokens, statements, paths of code graphs and code snippets
- develop methods to detect software vulnerabilities based on the multi-level code representations
- develop methods to generate user-understandable explanations for the detected vulnerable code.
Outcomes
Upon conclusion of this research project, we expect:
- improved models or algorithms to generate code embeddings at multiple levels for representing source code semantically
- models to detect vulnerable code
- models to provide explanations to the detected vulnerable code.
Skills and experience
To be considered for this project, we expect you to have:
- knowledge of data mining and machine learning
- knowledge of databases
- good programming skills (preferably Python).
Scholarships
You may be eligible to apply for a research scholarship.
Explore our research scholarships
Contact
Contact the supervisor for more information.