Data Science Coast To Coast
IDIES is a member of the Seminar Organizers panel for the Data Science Coast to Coast seminar series.
The DS C2C seminar series, hosted jointly by seven academic data science institutes, provides a unique opportunity to foster a broad-reaching data science community.
In the first half of 2021, we hosted five seminars, each featuring one faculty member and one postdoctoral fellow from two universities. Each speaker provided a 20-minute talk about ongoing projects and motivating issues, followed by 20 minutes of discussion with the audience. These seminars will serve as the launching point for follow-on research discussion meetings which will hopefully lead to fruitful collaborative research.
Rosemary Gillespie (Professor & Schlinger Chair in Systematic Entomology, University of California, Berkeley), Data Science to Measure the Natural World
Data Integration Across Space and Time to Infer Biodiversity Dynamics
The world’s ecosystems are under serious threat due to ongoing stressors of the Anthropocene, notably habitat destruction, climate change, loss of biodiversity, disease, and the spread of invasive species. Biodiversity in particular is suffering catastrophic decline and tracking and understanding the factors affecting change is a major challenge that we are currently not meeting. Unless we develop new approaches, it will take centuries to document biodiversity and identify attributes that render ecological communities robust and resilient to change, and by then it will likely be too late. Here, we examine insights we can gain into biodiversity dynamics by looking at ways that we can first assess spatial patterns of diversity, abundance, and foodwebs, and determine the response of the organisms within these communities, to the changing environments that surround them. We have piloted an environmental DNA approach to generate estimates of abundance and interactions of macroorganisms in terrestrial systems across different spatial scales. By applying various theoretical and modeling approaches to the vast amounts of genetic data, we can encapsulate the “status” of a biological community in terms of its integrity and potential resilience to change. Moreover, by analyzing these data through slices in time (months, years, decades, or longer), we can assess how the community might accommodate, adapt, or collapse in response to change. These changes include habitat transformation, climate modification, fire, or disease. The critical data challenge is to integrate data that characterize the biological community, genomic data that reveal the response of any given taxon to that change, with past, present, and modeled climate change data. We highlight the role of historic collections from museums, and the physical record they provide of past environments.
Shelly Trigg (Data Science Postdoctoral Fellow, University of Washington), Diversity in Animal Response to Environmental Change
Diversity in Animal Response to Environmental Change
How will ecosystems tolerate the climate and ocean change occurring now and predicted for the future? To begin addressing this question, we can subject different animals to different anthropogenic pressures and evaluate their responses. We can more sensitively and comprehensively assess responses by performing molecular surveys using omics technologies (e.g. genomics, proteomics, metabolomics, etc.), which allow us to more clearly see the cellular processes that underlie environmental tolerance and intolerance. This data can also help us compare between species since all species have these general molecules (DNA, proteins, metabolites) in common. I’m going to present data from different studies on marine invertebrates exposed to different environmental conditions, and describe how I used multiple data science approaches to distill large omics datasets into dominant biological pathways associated with environmental tolerance and intolerance. After summarizing responses across species and conditions, I will propose future directions and data science applications for the wealth of environmental omics data being generated.
16 June 2021
Miguel Jimenez-Urias (Postdoctoral Fellow, Earth and Planetary Sciences, Johns Hopkins University), Oceanic stirring and Mixing of Passive Scalars: A Novel Closure
Scale-dependent Shear Dispersion: Stirring and Mixing of Passive Tracers in the Ocean
Tracers that help regulate biogeochemical cycles in the ocean and atmosphere have complex spatial distributions due to the combined effect of stirring by the multi-scale shearing motions that are ubiquitous and persistent in the ocean, and the small-scale diffusive mixing resulting in spatially inhomogeneous, enhanced mixing rates. Computer models need to parameterize the effect of shear dispersion due to restrictions on computer power and numerical stability when running climate-scale ocean simulations. Such parameterizations, however, fail to represent scale dependency, an assumption not strictly applicable to the ocean. In this talk, we present new results describing scale-dependency of shear-dispersion by idealized oceanic flows that can lead to a better understanding and representation of tracer distribution in the oceans.
Urban Informatics with Dr. Arya Farahi and Dr. Kate Starbird
QUANTIFYING AND MITIGATING SOURCES OF BIAS IN A DECISION-SUPPORT SYSTEM
Applications of AI decision-support systems are increasingly shaping the fabric of our society. These systems can exhibit and exacerbate undesired biases that might hurt the under-represented population. Therefore, it is critical to evaluate these systems not only from a lens of predictive power and the rate of error but also from a lens of trustworthiness and fairness. In this talk, I will focus on two specific sources of bias in a decision-support system and propose mitigation strategies. In the first part, I will discuss biases originated from historical decisions and are reflected in data. I propose a metric of quantifying disparity in data and illustrate how we can alleviate these historical biases by applying simple modification to a decision-making system. In the second part, I will shed light on biases that are originated from predictive models. Predictive models are a central part of any decision-making system. The end-user act based on the information provided by these models. Biased or untrustworthy information mislead the end-user or incentivize the public to mistrust the system. I will present our mitigation method KiTE. KiTE is a hypothesis-testing framework with provable guarantees that enables practitioners to (i) test whether a model provides trustworthy information with respect to each sub-group of a population and (ii) estimate and correct for prediction bias at the individual and group levels.
REVEALING THE “BIG LIE”: COLLABORATIVE DATA SCIENCE FOR RAPID RESPONSE TO ONLINE DISINFORMATION
In this talk, I’ll present preliminary research results from ongoing efforts to understand the spread of disinformation about the 2020 Election. First, I’ll describe the mission, structure, and everyday work practices of the Election Integrity Partnership — a multi-stakeholder collaboration that addressed mis- and disinformation about the 2020 U.S. election in (near) real-time through rapid response data science. Next, I’ll take you through some of our analyses to show how the “Big Lie” — the sustained effort to sow doubt in the results of the 2020 election — took shape on social media platforms throughout the latter half of 2020. I’ll highlight the participatory nature of this disinformation campaign and reveal some of the “super spreader” accounts that helped produce and sustain it. Finally, I’ll note how some of the social media platforms have evolved their strategies to address this kind of disinformation and wrap up by talking about what might come next, both in terms of platform policies and future collaborations for rapid response to disinformation.
Robotics and human-computer interaction with Dr. Lydia Kavraki & Dr. Angela Radulescu
ROBOTICS IN THE ERA OF DATA SCIENCE
Advances in mechanisms, control theory and algorithms are delivering robots that explore the deep seas and distant planets, robots that work tirelessly in fulfillment centers, and robots that increasingly interact with people. This talk will touch upon recent developments in robotics with emphasis on our own work in motion planning. It will then discuss the tremendous impact that the integration of research in robotics, AI, and data science will have in our lives and society as a whole.
TOWARDS NATURALISTIC REPRESENTATION LEARNING IN HEALTH AND DISEASE
Humans learn more from their experiences than just how to behave in different situations. They also learn to organize experiences into internal representations that facilitate flexible behavior, in domains ranging from simple decision-making to goal-directed action in naturalistic, richly structured environments. In the first part of the talk, I will show that such representation learning relies on selective attention to constrain the dimensionality of environments that humans learn from; and that attention is in turn guided by inference over what features of the environment are relevant for the task at hand. In the second part of the talk, I will present ongoing work leveraging virtual reality (VR) in combination with eye-tracking to study representation learning in naturalistic settings. I will conclude with a discussion of how predictive modeling of behavior in VR may yield insights into cognitive factors that affect mental health.
Dr. Jeanne Holm
USING DATA TO IMPROVE EQUITY
In the midst of a pandemic and economic stress, governments have to make real-time decisions on maximizing safety and minimizing economic and personal impact. How can we use data, behavioral science, and our shared need for safety to create a more connected ecosystem where government, residents, and businesses share information in more intertwined ways. Getting access to that information, equitably, is a challenge throughout the world. In the City of Los Angeles, we use data-driven decisions to pave the way to connect all 4,000,000 residents with the information and services they need to thrive. Learn how Los Angeles is using data science and leading-edge technology to connect all of our communities, residents, and businesses.
Dr. Alex Szalay
FROM SKY SURVEYS TO CANCER: SPATIAL DATA EVERYWHERE
The talk describes a 25 year journey leading from the Sloan Digital Sky Survey to a wide range of projects in data science. There are many common threads: the need for extreme interactivity, the need for flexible data aggregation and the commonality of spatial data. The size of data sets have grown almost a million fold, but user expectations for almost instant results has not changed. The talk will describe the gradual evolution of the SciServer, and how new interactive metaphors to interact with hundreds of terabytes of turbulence simulations emerged. We will discuss how machine learning and AI tools are transforming science, from simulations to how large experiments are designed and executed. We will also emphasize that much of these new developments still rely on having unique high value data sets at our fingertips, and how the long term survival of these is entering a critical, endangered phase.
Dr. Talitha Washington
‘WHY WE CAN’T WAIT’: USING SOCIAL JUSTICE TO TRANSFORM DATA SCIENCE
Located in the “Cradle of the Civil Rights Movement”, the Atlanta University Center (AUC) Data Science Initiative has a keen focus to advance social justice through data science. The AUC is a consortium of four historically black colleges and universities (HBCUs) in Atlanta, Georgia: Clark Atlanta University, Morehouse College, Morehouse School of Medicine, and Spelman College. The inaugural director of the AUC Data Science Initiative, Dr. Talitha Washington, hopes to move data science towards ethics and fairness for Black America because “whatever affects one directly, affects all indirectly.”