Student Fellowship Awardees

Recipients of the IDIES Summer Student Fellowship

This program offers awards of $6,000 to support a summer research project lead by undergraduate students with the guidance of an IDIES faculty member mentor. These projects are meant to provide an opportunity for students to participate in a 10-week (June – August) full-time data-science focused research project in collaboration with an IDIES faculty member.

Summer 2023

Intelligent Microscopy Methods for Characterizing Intermediate States Using Electron Energy Loss Spectroscopy (EELS)

Jay Kim

First-year; Materials Science & Engineering, Computer Science (Double-Major)

Mentor: Mitra Taheri

Advances in Transmission Electron Microscope (TEM) resolution, precision, and collection rates have greatly improved our ability to probe complex structures and changes in materials. Machine learning frameworks, such as variational auto-encoders are powerful tools that enhance the processing and characterization of the complex signals received from the instruments.

Our auto-encoder will be trained using a set of handwritten digits provided by the MNIST database. We aim to train the auto-encoder to cluster and classify the different digits and image the latent space. We will then transition into a SrFeO3 TEM Electron Energy Loss Spectroscopy (EELS) dataset. We hope to identify, cluster, and classify transient intermediate states of observed chemical reactions and produce higher quality de-noised images.

Development and External Validation of a Machine Learning Model for Pulmonary Embolism Prediction in Intensive Care

Sampath Rapuri

First-year; Biomedical Engineering, Computer Science (Double-Major)

Mentor: Robert D. Stevens

“Pulmonary embolism (PE) is a severe and frequent complication in critically ill patients. While several clinical risk scores have been proposed, there is still an unmet need for accurate methods to predict the likelihood of developing PE. Accurate prediction is crucial for timely diagnosis and treatment, as untreated acute PE has a high mortality rate. On the other hand, diagnosed and treated PE has a significantly lower mortality rate. Therefore, a data-driven computational model that utilizes available physiological parameters has the potential to improve accuracy and help clinicians identify and treat PE in ICU patients more effectively. The objective of this study is to develop and externally validate a data-driven computational model for PE prediction in critically ill patients and compare its performance with currently available ICU PE scoring systems.”

AFLUX@JHU: Materials Search-API for the JHU aflow.org Data Repositories

William Shiber

Second-year Student in Applied Mathematics & Statistics
Mentor: Corey Oses

We propose to determine the stellar physical properties of stars in open clusters using the Isochrones Python package to execute with MultiNest, a Bayesian fit of the MESA Isochrones Stellar Tracks (MIST) isochrone grid to parallax and multiwavelength photometry data for open cluster stars of known metallicities and ages. Stars in open clusters are stars with similar ages and metallicities, and their physical properties provide valuable insight into the study of stellar evolution and exoplanets. We will apply our data science approach of determining stellar properties of stars in open clusters with photometric data and verify that our methodology provides accurate and precise results. Once verified, the result of this project will be valuable to researchers studying galactic astronomy and exoplanets in determining the physical parameters of stars through a data science approach.

Summer 2022

Optimizing Routes in the Operating Room

Ryan Chou (Pre-medical Biomedical Engineering major)
Mentor: Gregory D Hager

An ongoing effort by the healthcare community is directed towards understanding how novice surgeons become experts. Knowing what expert surgeons do that novices do not has the potential to greatly influence surgical training and assessment methods, and technologies. This project looks at septoplasty, a surgical procedure to remove asymmetrical parts of the nasal septum (which separates the two nostrils), specifically a phase where a surgeon must separate the mucosal flap (tissue) from the nasal septum (cartilage and bone). The objective of this study is to determine whether expert septoplasty surgeons can be differentiated from novices by their ability to move their instrument (cottle elevator) efficiently (in an optimal path) during the separation of the mucosa and the septum.

Laying the Foundation for Large-scale Precision Stellar Parameter Inference in the Field of Exoplanets

Keyi Ding (Computer Science and Physics major)
Mentor: Kevin C. Schlaufman

Summer 2021

Using Machine Learning to Predict Surgical Case Duration in Operating Room Scheduling Optimization

Shengwei Zhang (WSE – Applied Math & Stats)
Mentor: Tinglong Dai (Carey), Kimia Ghobadi (WSE)

Operating rooms (ORs) are the most expensive and financially productive resource in a hospital, and any disruption in their workflow can have a detrimental effect on the rest of the hospital operations. One of the main challenges in designing an efficient OR schedule is the uncertainty in surgical case time duration. The goal of this project is to develop some (at least three) machine learning models that can effectively predict the surgical case duration and compare the predictive power of these models in a real clinical setting. The models will be developed by using a large retrospective data set in a 12-month period from Johns Hopkins Hospital, with 80% of the total cases used for training and other 20% for testing/validation. Related features are considered from three categories, including patient, personnel, and procedure.

Optimizing Resource Distribution Based on Sales Price Data Through Machine Learning

Chengkai (Tony) Tian (WSE – Applied Math & Stats)
Mentor: Jian Ni (Carey)

The outbreak of COVID-19 alerts us to realize the importance of systematically understanding and optimizing the allocation of healthcare resources. This study integrates the sales and price data of typical over-the-counter medicines – in this study Tylenol, Aleve, and Advil – and other medical equipment such as gloves, wipes, tissue, and even food, in several cities across geographic locations, with the COVID case and death count data from Johns Hopkins COVID-panel. We supplement these data with other information such as temperature, rainfall, city demographics, age group, gender composition, income composition, and also local online search index for related symptoms. We construct a machine learning framework to investigate and identify the correlation between price floats against case counts. The study also studies how the pattern changes before and after special events such as local stock-out, mask-mandate, or stay-home order. Finally, based on the results from the training set, a potential re-optimized solution of resource allocation is proposed.

Humanizing Our Data: Proposal on Integrating Social and Behavioral Determinants of Health into Population Health Analytics

Jaxon Wu (KSAS – History of SC/MED/TECH)

Mentors: Jonathan Weiner (SPH), Chintan Pandya (SPH)

Socioeconomic inequalities have increasingly become recognized as a significant contributor to health disparities in the United States. To improve health equity at individual and population level, the health care sector must play a pivotal role in identifying and addressing the socioeconomic risk factors and needs of individuals. Through identifying those who are most socioeconomically vulnerable, we develop a social needs measure that enhances risk stratification of a population to further advance state-of-the-art predictive modelling tools for high-risk case detection that benefits care management as well. The end result of our project will have direct benefits for the inclusion of social and behavioral determinants of health data in the Johns Hopkins ACG software.

Summer 2020

Using Machine Learning to Design Highly Stable, Biologically Active Proteins

Gina El Nesr (WSE & KSAS)
Mentor: Doug Barrick (Biophysics, KSAS)

Researchers have sought methods to design proteins that are highly stable and retain their biological activities. The recent dramatic increase in genome sequencing data provides scientists with sufficient data for sequence-based protein design. One method for protein design that has shown success for stabilizing proteins uses consensus sequences. Although consensus sequences have been found to be more stable and biologically active, the implicit assumption of that residue’s frequencies are independent. In protein structure, residues are coupled to one-another in a large interconnected network of interactions. The goal of this project is to employ a robust method to design proteins that incorporates these residue interactions. By using Restricted Boltzmann Machines to learn residue sequence-structure encodings, we can potentially discover sequences of proteins with improved stabilities, solubilities and shelf-lives. Developing such a methodology has applications in pharmaceuticals, biotechnology, and chemical industries.

Who’s to blame: Insights from Atmospheric Data and High-Performance Computing on Greenhouse Gas Emissions from the US Natural Gas Industry

Olin Shipstead (WSE)
Mentor: Scot Miller (Environmental Health & Engineering, WSE)

We propose to use a national network of atmospheric satellite observations to estimate natural gas leakage from the oil and natural gas industry. I would trace the ethane observations back in time (using atmospheric dispersion models run in reverse) to determine ethane emissions rates. With estimated ethane emission rates, we could then evaluate government inventories of emissions – what the government thinks is being emitted — against our emissions calculations. Through these analyses, we will understand how large these emissions are, where they occur, and which oil and natural gas regions leak the most fugitive emissions. In addition, we will evaluate the strengths and weaknesses of using ethane as part of a long-term monitoring strategy.

Data-Driven Differential Diagnosis of Common Pulmonary Diseases in the ICU

Zherui Xuan
Stuart Ray (Health Sciences Informatics, SOM)

The goal of this project is to create an algorithm that will match a physician’s judgement in determining an ICU patient’s pulmonary differential diagnoses. The project will focus on ICU pulmonary patients and the very first stage of designing an AI driven CP: identification of differential diagnoses. During the summer, with the guidance of data scientists and physicians, I will research, create, and test such a computer algorithm focused on the assignment of pulmonary disease patients to five common pulmonary differential diagnoses (pneumothorax, bronchitis, COPD, pneumonia, lung cancer, and other) in the ICU by using vital signs (heart rate, blood pressure, respiratory rate, temperature, etc), common diagnostic labs (blood chemistry, hematology, urine analysis, microbiology tests, etc), and basic imaging reports (X-ray, CT scans, etc).

Hunting for Metal-Poor Main Sequence Stars in Spectroscopic Surveys

Vedant Chandra
Mentor: Kevin Schlaufman (Physics and Astronomy, KSAS)

Metal-poor stars are 10 billion year-old local relics of the early Universe. Therefore, their characteristics can be used to infer the properties of the first stellar generation and the earliest evolution of the Milky Way. These metal-poor stars are, however, rare and hard to find – only a small fraction of the Milky Way’s metal-poor stellar population has been characterized. One significant challenge in this field is the spectroscopic similarity between rare metal-poor main sequence stars and common cool white dwarfs. Our project develops machine learning algorithms for large spectroscopic surveys, tuned to break this degeneracy using Bayesian convolutional neural networks. We will publish our metal-poor star discoveries and distribute our software tools for use by the broader astronomical community.