Supervisors
- Position
- Professor in Statistics
- Division / Faculty
- Faculty of Science
- Position
- Division / Faculty
Overview
The goal of this project is to develop new Bayesian methods for large-scale data analysis using subsampling techniques. The focus of the project will be on generalised linear models (GLMs), which are commonly used models in statistics and machine learning.
One of the main challenges in using Bayesian statistics with big data is the high computational cost associated with processing big datasets. The proposed project aims to address this challenge by developing new subsampling techniques for Piecewise Deterministic Markov Process (PDMP) samplers. These samplers provide exact solutions to statistical models by using random subsamples of the dataset at each iteration, which can significantly reduce the computational cost.
The main objectives of this project are to:
- develop new subsampling methods for PDMPs that are suitable for scaling to big data and GLMs
- investigate strategies for optimising which subsamples are chosen, to ensure that the subsample is representative of the full dataset and that the methods perform well
- implement and test the proposed methods on large datasets, comparing their performance with existing methods
- explore the potential applications of the proposed methods in various fields, such as economics, biostatistics, and machine learning.
Research activities
The student will have the opportunity to work with a team of researchers who are already working on PDMP samplers and GLMs, and this work may lead to future publication in academic journals.
Outcomes
Development of new methods that can help scale up Bayesian analysis of GLMs to big data challenges. Software to assist in choosing subsets of the full dataset to analyse.
Skills and experience
This project is suitable for students with a background in mathematics, statistics, or computer science and a general interest in the field of probability and statistics. Prior knowledge of GLMs or sampling methods is not required.
Keywords
Contact
Contact the supervisor for more information.