Home | News & Events | Events | Working with Public Datasets Workshop

Working with Public Datasets Workshop

Friday, April 30, 2021,

Moderated by:
GERARD LEMSON, IDIES Director of Science
SAYEED CHOUDHURY, Associate Dean for Research Data Management
TINGLONG DAI, Associate Professor, Carey Business School 

In Partnership with: Sheridan Libraries


Welcome & Introductions:
1:00 – 1:25 pm Tinglong Dai, Associate Professor, Carey Business School Welcome & Event Background
Gerard Lemson, IDIES Director of Science, SciServer Lead  IDIES & SciServer, what we can offer
Mara Blake, Data Services Manager, Sheridan Libraries Overview of the Sheridan Data Services Offerings
Deep Dive Research Presentations:  Q&A to follow each presentation
1:25 – 2:45 pm Andrew Ching, Professor, Carey Business School Consumption Responses to an Unpopular Policy: Evidence from a Short-lived Soda Tax
Curt Cronister, Senior Data Manager, Baltimore Education Research Consortium (BERC)  What (When) (Where) is a School? Using Public Data to Contextualize Schools in Educational Research
2:45 – 3:00 pm BREAK
3:00  – 4:00 pm Panel Discussion
Sayeed Choudhury, Associate Dean for Research Data Management; Hodson Director of the Digital Research and Curation Center
Dr. Jim Kyung-Soo Liew, Associate Professor of Finance, Carey Business School
Jose Arrieta, former Chief Information Officer of the United States Department of Health and Human Services; Adjunct Professor, Carey Business School
Charles Meneveau, the Louis M. Sardella Professor of Mechanical Engineering, Professor, Department of Physics and Astronomy (joint appointment), Professor, Department of Environmental Health and Engineering (joint appointment), Associate Director, Institute for Data Intensive Engineering and Science
Marc Stein, Associate Professor, School of Education; Managing Director of the Baltimore Education Research Consortium (BERC) 

Have you ever wanted to use “public” datasets in your research,
and just weren’t sure how or where to start?

Is the work involved in handling these data sets overwhelming your local computer resources? Does it require technical expertise that you do not have available? Do you want to collaborate with colleagues but have problems efficiently sharing access to the data and processing pipelines? Do you want to connect your data with other similar data sets, but does this just multiply the problems?

IDIES has created an afternoon workshop that will lead you through what the possibilities are of working with public datasets and will discuss options how IDIES and the Sheridan Libraries can assist in those efforts.

A dataset is considered public if it is made available, most often online, to the general public, though access does not have to be free or without requiring registration. Importantly, and in contrast to most IDIES-published data sets so far, we will focus on data sets that were not created for some explicit scientific project. For example: governmental, public service or commercial reasons. These datasets were not created with your specific scientific research in mind, which generally means much work is required to put them in a form suitable for your analysis. This is even more so when different public data sets should be combined together.

During this workshop some real-world examples will be presented, using public datasets, how they have been applied in past and ongoing research. A panel will explore how IDIES could assist researchers to obtain and analyze such data sets. For example could IDIES collect a variety of the most interesting public datasets, to be accessed and analyzed in a single place? Using advanced storage capabilities, providing simple interfaces for accessing, and analyzing the data using the compute resources IDIES offers through SciServer.

Visual example of GIS data from the California Public School system.
Map showing where a patient lives affects kidney transplant wait times. Wait time and transplant data is from 2019 and provided by United Network for Organ Sharing (UNOS).

Possible types of datasets:

  • Census
  • Weather
  • Social Media
  • Health
  • Transportation and mobility
  • Finance
  • Education
  • Audio, images, and videos
  • Disease transmission

Possible types of research that can utilize these datasets:

  • Climate Change
  • Diversity, equity, and inclusiveness
  • Sustainability
  • Global Supply Chains
  • How AI will shape the future
  • Social Networks
  • Access to Education
  • Health policy

Who can benefit from learning more about using these datasets?

Anyone doing research
at JHU.