QUT - Transport big data analytics: Imputing missing data

Study levels

PhD
Master of Philosophy
Honours

Faculty/School

Faculty of Engineering

School of Civil and Environmental Engineering

Topic status

We're looking for students to study this topic.

Research centre

Centre for Data Science

Supervisors

Professor Ashish Bhaskar

Position: Professor in Civil Engineering
Division / Faculty: Faculty of Engineering

Overview

The missing data problem is often unavoidable for real-world data collection systems because of a variety of factors, such as sensor malfunctioning, maintenance work, transmission errors, and so on. Filling in missing information in a dataset is an important requirement for many machine-learning algorithms that require a complete dataset as input. Data imputation algorithms aim at filling the missing information in a dataset. Many missing data imputation techniques exist in the literature, with applications demonstrated on various types of datasets. The imputation performance of these techniques, however, varies according to the type of dataset and the patterns in which data is missing. As a result, analysts frequently face a challenge: how to choose the best imputation algorithm for their dataset? This research aims at developing recommendations that will help data analysts with the selection of the most appropriate missing data imputation algorithm for their dataset.

Research activities

Stage 1: Developing a classification scheme for datasets based on the nature of data and patterns in missing information.

Stage 2: Designing a framework for data imputation performance assessment.

Stage 3: Testing performance of different data imputation algorithms reported in the literature on different types of datasets (defined in stage 1) using the data imputation performance assessment framework (defined in stage 2).

Stage 4: Designing recommendations based on the findings of stage-1 to stage 3.

Outcomes

A framework that will help analysts to choose the best data imputation algorithms for their incomplete datasets.

Skills and experience

Programming languages like Python, MATLAB, etc.

Some knowledge of machine learning and basic statistics.

Scholarships

You may be eligible to apply for a research scholarship.

Explore our research scholarships

Keywords

Contact

Contact the supervisor for more information.