Research Team:
Matthew W. Ippolito
Department of Medicine
Proposal
Malaria remains a major public health threat, particularly in sub-Saharan Africa where the genetic complexity of Plasmodium falciparum complicates disease control by obscuring our understanding of transmission dynamics and the emergence of drug resistance. When patients are simultaneously infected by multiple strains (or clones) of the malaria parasite—a common occurrence in moderate and high transmission areas—clarifying how within-host dynamics shape strain dominance becomes fundamental to interpreting genetic surveillance data and tracing the geographic spread of emerging variants, including drug resistant variants. Early genotyping techniques, such as PCR assays targeting msp1 and msp2, remain in wide use but offer only limited insights into the parasite’s genomic landscape.
The overarching goal of this project is to develop and validate a machine-learning(ML)-based pipeline to classify and reassemble short, error-prone malaria parasite sequences into their corresponding genomes, especially in the setting of mixed infections (Fig. 1). Our vision is a robust computational workflow that can be deployed in both research and clinical laboratories, enabling the precise characterization of P. falciparum genotypes and clarifying the genetic basis of drug resistance, persistent infections, and treatment failures.
To achieve this overarching goal, we propose two specific aims:
Aim 1. Develop, train, and optimize a deep neural network (NN) model that can accurately reconstruct malaria parasite genomes from differentially dense mixed-infection, error-prone short-read sequencing data.
Aim 2. Deploy the trained model on large-scale, real-world P. falciparum genomic datasets to enable robust subclonal inference and deliver a user-friendly pipeline for broader dissemination.


