Supervisors
- Position
- Lecturer (TIEA)
- Division / Faculty
- Faculty of Science
- Position
- Adjunct Professor
- Division / Faculty
- Faculty of Science
- Position
- Associate Professor
- Division / Faculty
- Faculty of Science
- Position
- Professor
- Division / Faculty
- Faculty of Science
- Position
- Associate Professor
- Division / Faculty
- Faculty of Science
External supervisors
- Dr Andrew Trotman, University of Otago
Overview
Advances in sequencing technologies over the past two decades have led to an explosion in the availability of genomic sequence data and an increasingly urgent need for scalable clustering and search facilities. One approach is to encode sequences as binary vectors in a high-dimensional space, simplifying the comparison and allowing it to be computed very rapidly using bit-level operations.
Coupled with these ideas is the need to provide clustering methods and efficient indexing and lookup in response to search queries. One approach to doing this is to use ideas from text-based information retrieval, optimised to work with the distribution of k-mers - words of length k - within the genomic collection.
Research activities
The work undertaken will depend on the level of the student, but the main activities will include:
- indexing large scale sequence collections and experimenting with clustering and search
- developing new encodings and analysing the results
- developing and extending software tools to make these approaches usable for biologists
- benchmarking against other tools and approaches.
Outcomes
We are looking to develop new algorithms and tools that will make precise search of large scale sequence collections much faster than it currently is. So we are seeking to implement and publish new encodings and new approaches to clustering and search and to prove that they are faster than others.
Skills and experience
For this project we are looking for students with good programming skills, an ability to work with complex datasets and to understand machine learning algorithms, and a willingness to learn the biology needed to understand the domain. Most of our students have studied or are studying computer science, but we welcome anyone who comes with a mix of skills that can attack the problem. Those with a joint degree involving molecular biology and computer science are especially welcome, but please get in touch if this sounds like you.
It isn't necessary for you to be an extraordinary software developer but you need to comfortable in python or C# or Java or F# or other modern languages. This isn't a project where you can learn to program. We will teach you the biology and the machine learning as the project takes shape.
If you are undertaking this project as an Honours or PhD student then you may be eligible to apply for a scholarship.
Scholarships
You may be eligible to apply for a research scholarship.
Explore our research scholarships
Keywords
Contact
Contact the supervisors for more information.