Student Fellowship Awardees
Recipients of the IDIES Summer Student Fellowship
This program offers awards of $6,000 to support a summer research project lead by undergraduate students with the guidance of an IDIES faculty member mentor. These projects are meant to provide an opportunity for students to participate in a 10-week (June – August) full-time data science focused research project in collaboration with an IDIES faculty member.
Abstracts from previous Fellowship Award recipients’ projects are available below:
Mentor: Gregory D Hager
Mentor: Kevin C. Schlaufman
We propose to determine the stellar physical properties of stars in open clusters using the Isochrones Python package to execute with MultiNest, a Bayesian fit of the MESA Isochrones Stellar Tracks (MIST) isochrone grid to parallax and multiwavelength photometry data for open cluster stars of known metallicities and ages. Stars in open clusters are stars with similar ages and metallicities, and their physical properties provide valuable insight into the study of stellar evolution and exoplanets. We will apply our data science approach of determining stellar properties of stars in open clusters with photometric data and verify that our methodology provides accurate and precise results. Once verified, the result of this project will be valuable to researchers studying galactic astronomy and exoplanets in determining the physical parameters of stars through a data science approach.
Shengwei Zhang (WSE – Applied Math & Stats)
Mentor: Tinglong Dai (Carey), Kimia Ghobadi (WSE)
Operating rooms (ORs) are the most expensive and financially productive resource in a hospital, and any disruption in their workflow can have a detrimental effect on the rest of the hospital operations. One of the main challenges in designing an efficient OR schedule is the uncertainty in surgical case time duration. The goal of this project is to develop some (at least three) machine learning models that can effectively predict the surgical case duration and compare the predictive power of these models in a real clinical setting. The models will be developed by using a large retrospective data set in a 12-month period from Johns Hopkins Hospital, with 80% of the total cases used for training and other 20% for testing/validation. Related features are considered from three categories, including patient, personnel, and procedure.
Chengkai (Tony) Tian (WSE – Applied Math & Stats)
Mentor: Jian Ni (Carey)
The outbreak of COVID-19 alerts us to realize the importance of systematically understanding and optimizing the allocation of healthcare resources. This study integrates the sales and price data of typical over-the-counter medicines – in this study Tylenol, Aleve, and Advil – and other medical equipment such as gloves, wipes, tissue, and even food, in several cities across geographic locations, with the COVID case and death count data from Johns Hopkins COVID-panel. We supplement these data with other information such as temperature, rainfall, city demographics, age group, gender composition, income composition, and also local online search index for related symptoms. We construct a machine learning framework to investigate and identify the correlation between price floats against case counts. The study also studies how the pattern changes before and after special events such as local stock-out, mask-mandate, or stay-home order. Finally, based on the results from the training set, a potential re-optimized solution of resource allocation is proposed.
Jaxon Wu (KSAS – History of SC/MED/TECH)
Mentors: Jonathan Weiner (SPH), Chintan Pandya (SPH)
Socioeconomic inequalities have increasingly become recognized as a significant contributor to health disparities in the United States. To improve health equity at individual and population level, the health care sector must play a pivotal role in identifying and addressing the socioeconomic risk factors and needs of individuals. Through identifying those who are most socioeconomically vulnerable, we develop a social needs measure that enhances risk stratification of a population to further advance state-of-the-art predictive modelling tools for high-risk case detection that benefits care management as well. The end result of our project will have direct benefits for the inclusion of social and behavioral determinants of health data in the Johns Hopkins ACG software.
Gina El Nesr (WSE & KSAS)
Mentor: Doug Barrick (Biophysics, KSAS)
Researchers have sought methods to design proteins that are highly stable and retain their biological activities. The recent dramatic increase in genome sequencing data provides scientists with sufficient data for sequence-based protein design. One method for protein design that has shown success for stabilizing proteins uses consensus sequences. Although consensus sequences have been found to be more stable and biologically active, the implicit assumption of that residue’s frequencies are independent. In protein structure, residues are coupled to one-another in a large interconnected network of interactions. The goal of this project is to employ a robust method to design proteins that incorporates these residue interactions. By using Restricted Boltzmann Machines to learn residue sequence-structure encodings, we can potentially discover sequences of proteins with improved stabilities, solubilities and shelf-lives. Developing such a methodology has applications in pharmaceuticals, biotechnology, and chemical industries.
Olin Shipstead (WSE)
Mentor: Scot Miller (Environmental Health & Engineering, WSE)
We propose to use a national network of atmospheric satellite observations to estimate natural gas leakage from the oil and natural gas industry. I would trace the ethane observations back in time (using atmospheric dispersion models run in reverse) to determine ethane emissions rates. With estimated ethane emission rates, we could then evaluate government inventories of emissions – what the government thinks is being emitted — against our emissions calculations. Through these analyses, we will understand how large these emissions are, where they occur, and which oil and natural gas regions leak the most fugitive emissions. In addition, we will evaluate the strengths and weaknesses of using ethane as part of a long-term monitoring strategy.
Stuart Ray (Health Sciences Informatics, SOM)
The goal of this project is to create an algorithm that will match a physician’s judgement in determining an ICU patient’s pulmonary differential diagnoses. The project will focus on ICU pulmonary patients and the very first stage of designing an AI driven CP: identification of differential diagnoses. During the summer, with the guidance of data scientists and physicians, I will research, create, and test such a computer algorithm focused on the assignment of pulmonary disease patients to five common pulmonary differential diagnoses (pneumothorax, bronchitis, COPD, pneumonia, lung cancer, and other) in the ICU by using vital signs (heart rate, blood pressure, respiratory rate, temperature, etc), common diagnostic labs (blood chemistry, hematology, urine analysis, microbiology tests, etc), and basic imaging reports (X-ray, CT scans, etc).
Mentor: Kevin Schlaufman (Physics and Astronomy, KSAS)
Metal-poor stars are 10 billion year-old local relics of the early Universe. Therefore, their characteristics can be used to infer the properties of the first stellar generation and the earliest evolution of the Milky Way. These metal-poor stars are, however, rare and hard to find – only a small fraction of the Milky Way’s metal-poor stellar population has been characterized. One significant challenge in this field is the spectroscopic similarity between rare metal-poor main sequence stars and common cool white dwarfs. Our project develops machine learning algorithms for large spectroscopic surveys, tuned to break this degeneracy using Bayesian convolutional neural networks. We will publish our metal-poor star discoveries and distribute our software tools for use by the broader astronomical community.