IDIES RSE Luncheon Seminar Series


Image courtesy of Daniel Muthukrishna’s personal site: https://danielmuthukrishna.com/


With a PhD in Astrophysics from the University of Cambridge, Dr. Muthukrishna specializes in applying advanced machine learning techniques to analyze astronomical data. He is a vital member of the Transiting Exoplanet Survey Satellite (TESS) team, where he leverages neural networks to classify exoplanets.

His research covers a range of topics, from modeling supernovae to anomaly detection in large datasets using cutting-edge methods like diffusion models and transformers. Dr. Muthukrishna has developed widely-used software tools that enhance the accuracy and efficiency of astronomical data analysis.

Speaking on:

Register

Loading gif

Talk Abstract: Foundation models for scientific data must contend with a fundamental challenge: observations often conflate the true underlying physical phenomena with systematic distortions introduced by measurement instruments. This entanglement limits model generalization, especially in heterogeneous or multi-instrument settings.

In this talk, I  present a causally-motivated foundation model that explicitly disentangles physical and instrumental factors using a dual-encoder architecture trained with structured contrastive learning or a generative flow-matching model. Leveraging naturally occurring observational triplets (i.e., where the same target is measured under varying conditions, and distinct targets are measured under shared conditions), the model learns separate latent representations for the underlying physical signal and instrument effects. Evaluated on simulated astronomical time series designed to resemble the complexity of variable stars observed by missions like NASA’s Transiting Exoplanet Survey Satellite (TESS), the method outperforms traditional single-latent space foundation models on downstream prediction tasks, particularly in low-data regimes. These results demonstrate that our model supports key capabilities of foundation models, including few-shot generalization and efficient adaptation, and highlight the importance of encoding causal structure into representation learning for structured data. 

Registration is free, but required. A pizza lunch will be provided.

Institute for Data-Intensive Engineering and Science (IDIES)


IDIES operates with support from:

JHU DSAI logo
NSF logo
NASA logo
NIH logo
Alfred P Sloan Foundation logo
Grant and Betty Moore logo
John Templeton Foundation logo
WM KECK Foundation logo
Intel logo
Microsoft logo
Nokia logo
nvidia logo