Faculty/School

Faculty of Science

School of Information Systems

Topic status

We're looking for students to study this topic.

Research centre

Supervisors

Dr Sareh Sadeghianasl
Position
Lecturer in Information Systems
Division / Faculty
Faculty of Science
Professor Arthur ter Hofstede
Position
Principal Research Fellow
Division / Faculty
Faculty of Science

Overview

Praeclarus is an open-source software framework that aims to facilitate data pre-processing for process mining. Process mining is specialised data mining focusing on process-data. It is of high interest to industry, with the market doubling every two years (e.g., increasing from $550M in 2020 to $1,8B in 2023). This market increase has meant that big companies like Microsoft, SAP, and IBM are acquiring process mining vendors such is Minit, Signavio, and myInvenio.

Recent process mining surveys show that more than 60% of the time and effort is spent on data transformation and pre-processing. These steps include uploading the data sets in different formats, the detection of quality issues, and their repair. This project aims to address different aspects of the development of the Praeclarus framework.

Research engagement

This project will involve one or more of the following activities:

  • a review of state of the art in software architecture, engaging visualisations, and traceable data cleaning
  • designing the software architecture for the Praeclarus framework
  • investigating the best way to present the results of the detection and the repair of data quality issues
  • developing plugins for the Praeclarus software framework in the form of algorithms to detect and repair data quality issues in process-data
  • strategies for maintaining the provenance of data during the repair process
  • communicating the findings in publications.

Outcomes

The prospective outcomes depend on the scope of the project and may include the following:

  • literature review
  • the software architecture for the Praeclarus framework
  • design and development of prototypes for the detection and the repair of data quality issues in the form of plugins for the Praeclarus software framework
  • best-practice data provenance strategies
  • best-practice visualisations of the results of the detection and repair of data quality issues.

Skills and experience

This project needs one or more of the following skills:

  • preliminary knowledge of process mining and software development
  • programming standalone or web-based applications using e.g., Java, JavaScript, HTML, CSS, PHP, Angular, Python
  • time management skills to deliver outcomes within a specific time frame.
  • excellent written and verbal communication skills.

Start date

15 December, 2024

End date

28 February, 2025

Location

GP Campus, Y Block

Additional information

You may be eligible to apply for a research scholarship.

Keywords

Contact

Contact the supervisors for more information.