Jonathan Weiner, Hadi Kharrazi, Elham Hatef (Center for Population Health Information Technology, Health Policy and Management, Bloomberg School of Public Health)
Mark Dredze (Center for Language and Speech Processing & Malone Center for Engineering in Healthcare, Whiting School of Engineering)
Christopher Chute (School of Medicine & Chief Research Information Officer, Johns Hopkins Health System)
Almost all healthcare interactions are now documented by electronic health records (EHRs). The majority of EHR content is captured as “free-text.” These unstructured data are currently the most complete source of digital information on social determinants of health (SDH). SDH factors are critical for targeting medical and public health interventions. This pilot project will analyze EHR data from cohorts of patients at Atrius Health HMO in Massachusetts and the JH Health System. This project will focus on three research questions; Can SDH information in text be accurately categorized; What is the prevalence of SDH risk factors expressed in these records; and, Can natural language processing (NLP) methods effectively derive SDH information in large EHR free text databases?