Supervisors
- Position
- Senior Lecturer in Cybersecurity
- Division / Faculty
- Faculty of Science
Overview
In recent years, machine learning has enjoyed profound success in a range of interesting applications such as natural language processing, computer vision and speech recognition. It has been possible mainly due to, in addition to better computing resources, the availability of large amounts of training datasets to these applications. However, in software security research, the lack of large datasets is an open problem that makes it challenging for machine learning to reason about security vulnerabilities found in real-world software. The very limited number of existing datasets for software security are typically handcrafted test programs that are very small and imprecisely labelled.
This project aims to investigate novel automatic techniques for programmatically generating large training datasets for software security research.
Research activities
The project will:
- explore modern program code analysis techniques for labelling real-world software that exhibits security vulnerabilities
- investigate state-of-the-art code synthesis techniques to create datasets from known vulnerable software
- implement a prototype that combines code analysis and synthesis techniques to generate large sample datasets
- experiment with the prototype and evaluate sample datasets with supervised machine learning.
Outcomes
- a large, labelled dataset for machine learning in software security research
- novel techniques for generating large datasets of the vulnerable software
- a prototype dataset generator.
Skills and experience
- solid background in computer science
- programming experience in languages like Python or Java
- GPA > 5.5
Scholarships
You may be eligible to apply for a research scholarship.
Explore our research scholarships
Keywords
Contact
Contact the supervisor for more information.