Supervisors
- Position
- Professor in Civil Engineering
- Division / Faculty
- Faculty of Engineering
Overview
The missing data problem is often unavoidable for real-world data collection systems because of a variety of factors, such as sensor malfunctioning, maintenance work, transmission errors, and so on. Filling in missing information in a dataset is an important requirement for many machine-learning algorithms that require a complete dataset as input. Data imputation algorithms aim at filling the missing information in a dataset. Many missing data imputation techniques exist in the literature, with applications demonstrated on various types of datasets. The imputation performance of these techniques, however, varies according to the type of dataset and the patterns in which data is missing. As a result, analysts frequently face a challenge: how to choose the best imputation algorithm for their dataset? This research aims at developing recommendations that will help data analysts with the selection of the most appropriate missing data imputation algorithm for their dataset.
Research activities
Stage 1: Developing a classification scheme for datasets based on the nature of data and patterns in missing information.
Stage 2: Designing a framework for data imputation performance assessment.
Stage 3: Testing performance of different data imputation algorithms reported in the literature on different types of datasets (defined in stage 1) using the data imputation performance assessment framework (defined in stage 2).
Stage 4: Designing recommendations based on the findings of stage-1 to stage 3.
Outcomes
A framework that will help analysts to choose the best data imputation algorithms for their incomplete datasets.
Skills and experience
Programming languages like Python, MATLAB, etc.
Some knowledge of machine learning and basic statistics.
Scholarships
You may be eligible to apply for a research scholarship.
Explore our research scholarships
Keywords
Contact
Contact the supervisor for more information.