With over 31.1 million coronavirus (COVID-19) cases in the world and 6.8 million in the U.S., there is a need for a faster, better way to understand the symptoms patients are facing and how to deal with long-term health complications.
A new collaboration led by Bloomberg Distinguished Professor Dr. Christopher Chute does just that. His team is building a platform called the National COVID Cohort Collaborative (N3C) Data Enclave to make electronic health data of COVID-19 patients accessible to health-care providers and researchers around the country and select parts of the world.
Such an improvement may translate into better speed and accuracy in diagnosing and treating long-term complications of the virus.
Chute kicked off the effort in partnership with the National Institutes of Health. In a Looking Forward @ Johns Hopkins webinar held on Sept. 17, Chute highlighted how the N3C is the result of selfless collaboration between organizations and professionals from across fields.
“We recognize that this is a national crisis. It’s enormously gratifying to see all of these communities, and the individuals within them, contributing hundreds of personal hours to the N3C,” he said.
What are the implications of the N3C Data Enclave? Say a COVID-19 patient comes into the hospital with an acute kidney complication. N3C acts as a giant database that matches this patient’s complications to others who faced similar issues. This can help providers figure out what other complications the patient may be facing and how to best treat the patient, based on previous records.
There are three phases to how the N3C works.
First, there is the so-called regulatory phase, where the various organizations that contribute data to the N3C understand the kind of health records they are sending in and the protections around such data.
Second, there is harmonization of the data. Different institutions store health data in different formats and use different tools. This phase standardizes the electronic records into a single format for the N3C database.
The last step is the integration phase, where the data are made accessible to other institutions and hospitals to use.
“The data processing isn’t rocket science. It’s just tedious work,” Chute said in an interview with The News-Letter.
According to Chute, the technological component of the N3C Data Enclave is relatively straightforward. The difficulty is assuaging concerns over patient privacy.
The N3C has potential to be compromised. The database has thousands of electronic health records containing sensitive patient information. No existing database on COVID-19 cases is this comprehensive, which Chute acknowledged.
“Nobody has done this before. What are you gonna do, put millions of patient records in one place, with a big sign on it, saying ‘Hack Me?’ I mean, that’s what we’re doing. And we know it,” he said.
However, Chute emphasized that great care has been taken to protect patients’ data.
The Data Enclave has four tiers, each with various levels of regulatory control: Synthetic Data (not finalized yet), Aggregate Data, HIPAA Safe Harbor and HIPAA Limited Dataset.
The team working on the N3C is extremely careful about agreements made with health-care systems which provide electronic health records.
Scientific justifications, Institutional Review Board approval and more are required by researchers seeking to access data with more identifiers, such as zip code. Ultimately, electronic health records residing in the N3C Data Enclave are protected by military-grade security.
According to Chute, the N3C must maintain a critical balance between erasing all of the data to maximize patient privacy and publishing all of the data online to provide open access to anyone, which maximizes research. He describes these two extremes as having “intrinsic tension.”
“We have an obligation to society to analyze this data and share the results of it, but we also have an obligation to patients to protect their privacy,” he said. “It’s a very fine needle to thread.”
As N3C works to achieve a balance between maintaining patient privacy and increasing accessibility to critical health data, according to Chute, it has the potential to become one of the largest COVID-19 databases in the United States.