A Big-Data Engine for Large-Scale Splicing Screens

Ben Langmead (Department of Computer Science, Whiting School of Engineering)

Seth Blackshaw (Department of Neuroscience, School of Medicine)

Jonathan Ling (Neuroscience, School of Medicine)

RNA sequencing provides an inexpensive, high-resolution window on gene expression patterns. With the accumulation of sequencing data in public archives, researchers now have vast datasets in which to search for clinically-relevant patterns. But the computational resources and skills needed to query the data are not widely available. We will create new software systems enabling large-scale splicing screens against hundreds of thousands of archived samples. The systems will (a) answer queries about splicing associations, e.g. between transcription factors and splicing in disease, and (b) perform bulk screens to find associations between metadata variables (e.g. knock-down or disease states), and splicing patterns. We will use these tools to find associations relevant to neurodegenerative disease and cancer.


IDIES logo