Study level

  • Master of Philosophy
  • Honours
  • Vacation research experience scheme

Faculty/School

Faculty of Science

School of Mathematical Sciences

Topic status

We're looking for students to study this topic.

Research centre

Supervisors

Professor James McGree
Position
Professor in Statistics
Division / Faculty
Faculty of Science
Dr Matt Sutton
Position
Lecturer in Statistical Inference for Complex Models
Division / Faculty
Faculty of Science

Overview

The goal of this project is to develop new Bayesian methods for large-scale data analysis using subsampling techniques. The focus of the project will be on generalised linear models (GLMs), which are commonly used models in statistics and machine learning.

One of the main challenges in using Bayesian statistics with big data is the high computational cost associated with processing big datasets. The proposed project aims to address this challenge by developing new subsampling techniques for Piecewise Deterministic Markov Process (PDMP) samplers. These samplers provide exact solutions to statistical models by using random subsamples of the dataset at each iteration, which can significantly reduce the computational cost.

The main objectives of this project are to:

  • develop new subsampling methods for PDMPs that are suitable for scaling to big data and GLMs
  • investigate strategies for optimising which subsamples are chosen, to ensure that the subsample is representative of the full dataset and that the methods perform well
  • implement and test the proposed methods on large datasets, comparing their performance with existing methods
  • explore the potential applications of the proposed methods in various fields, such as economics, biostatistics, and machine learning.

Research activities

The student will have the opportunity to work with a team of researchers who are already working on PDMP samplers and GLMs, and this work may lead to future publication in academic journals.

Outcomes

Development of new methods that can help scale up Bayesian analysis of GLMs to big data challenges. Software to assist in choosing subsets of the full dataset to analyse.

Skills and experience

This project is suitable for students with a background in mathematics, statistics, or computer science and a general interest in the field of probability and statistics. Prior knowledge of GLMs or sampling methods is not required.

Keywords

Contact

Contact the supervisor for more information.