CTSA Profile: Putting Big Data Ideas Into Action

Erich Huang, MD, PhD, Assistant Professor in Biostatistics and Bioinformatics

March 8, 2016

In 2014, Erich Huang, PhD, MD, was the first faculty recruit to the Division of Translational Biomedical Informatics within the Department of Biostatistics and Bioinformatics. Duke was familiar territory – he had completed the MD/PhD program in 2003 and was an assistant professor in Surgery for several years before heading to Seattle to become the director for cancer research at Sage Bionetworks. “Being at Sage was tremendous in exposing me to open science, technology, and other ideas that support data-driven predictive modelling,” he said. “But what attracted me to return to Duke was the opportunity to actually implement some of these ideas in an academic health system.”

Here, in his own words, are some of Huang’s reflections on what has shaped his career in bioinformatics.


How did you initially get involved in biostatistics and bioinformatics?

I did my PhD in the early days of genomics, and I saw that we were generating what seemed at the time to be vast amounts of data. I knew that it would be good to have a facile grasp of how to manage such data and do predictive modelling. I learned scientific programming with the Statistical Science Department with former chair Mike West, which exposed me to a whole new world of how to use data. I have come to see that the techniques we used to analyze genomics and create predictive models can also be used to analyze the ever-increasing amounts of data created by electronic health records.


How has your training as a surgeon affected your work as a big data scientist?

Surgical training is definitely not about predictive modelling – it is very hands on. But it helped me learn what are critical clinical problems and what information we need to solve those problems. Simply solving a problem theoretically isn’t enough. The delivery of healthcare comes through people, so you have to solve a problem in a way that makes it easy for people to understand and use the information. Big data won’t replace doctors, but it can provide tools to enhance the awareness of physicians to what is happening, and thereby improve care.


How do you explain to an 8th grader what you do at Duke?

I usually tell people I focus on big data in health care at Duke. We have a long history at Duke of using electronic health records (EHRs), but my generation is the first to be working in a situation where electronic health records are ubiquitous. We need to learn how to assimilate all of this data to make it useful. This could mean predicting a clinical outcome, or classifying a patient as having a particular sort of phenotype of a disease so we can guide care, or just thinking about how we integrate different types of data, such as integrating genomic data into electronic health records.


What are a couple of projects you are working on?

One is the Duke Data Service, funded by the School of Medicine, the NIH Big Data to Knowledge Consortium, and the  Burroughs Welcome Fund. We are creating an infrastructure that makes it easy for investigators to gather and store the data they collect and generate it in a transparent manner so that others can reproduce their scientific findings. For example, if a scientist creates a predictive model of medication adherence based on electronic health records and geospatial data, someone who wants to scale it up to use in a health system needs to understand how the model works. That is hard to do in a published paper in JAMA. People need to be able to access the data.

Another project is CALYPSO – the Clinical and Analytic Learning Platform for Surgical Outcomes. It is a project where we are using machine learning and electronic health records to predict whether a surgical patient is going to have one of a variety of complications. The aim is to provide predictions in real time so that rounding teams can have the information at hand during their rounds. We think CALYPSO can provide a system that synthesizes a huge amount of data that might not be readily apparent to busy healthcare workers. If we can provide insights in an easy-to-understand manner, it can enhance the healthcare teams’ situational awareness.

Others include the Baseline Study in conjunction with Verily (formerly Google Life Sciences), the Chronic Kidney Disease project (previously reported on in the DTMI newsletter), the MURDOCK Study, and the Surgical Critical Care Initiative.


What is the biggest challenge you face in dealing with big data?

A huge challenge these days is interoperability -- finding easy ways for information to be securely exchanged between providers and platforms. We gather so much data, but often we can’t share it. It is frustrating for a patient when we have to reorder tests because the patient’s information from a previous health system is in a format that our health system can’t read. The same is true for research. We need to find tools that allow us to move and share data. It is absolutely essential for improving healthcare nationally.


What are your passions in life in addition to big data?

My family is my biggest passion. I have a wonderful wife and three children. Life-long learning is also a passion. I’d be bored if I wasn’t constantly learning. And the arts are very important to me. I grew up expecting to be a violin soloist, and although I’m not playing anymore, music is still very important to me.


What book are you currently reading?

I’m an avid reader, but I’m only just now getting around to tackling War and Peace. I watched the series on PBS and it inspired me to read the book. On my Kindle.


What do you do that is just for you?

Last year I took my daughter to the Mid-South Fencers’ Club in Durham for an introductory fencing course. I decided it looked like so much fun that I wanted to do it. So now I practice sabre fencing once a week and have just participated in my first tournament. I enjoy it because when I am fencing, I am focusing only on my opponent. It removes me completely from my work life.