Mining Electronic Medical Records

At the intersection of nursing informatics and comparative effectiveness research, Kenrick Cato, PhD, associate research scientist, is panning for gold.

For Cato, the mother lode is a treasure trove of data hidden in electronic medical records (EMRs) that may reveal better ways to fight infections in hospitals. Cato recently received funding from the National Institute of Nursing Research to sift through a database of more than 319,000 patient discharge records from four New York City hospitals and search for new ways of identifying patients with infections, knowledge that can inform prevention and treatment efforts.

Nursing informatics is the science and practice that integrates nursing, its information and knowledge, with management of information and communication technologies to promote the health of people, families, and communities. Like much of the work in the field, Cato’s project focuses on what’s known as Big Data –sets of records so vast that they’re difficult to analyze without customized software or tools. Adding another layer of complexity, EMRs can contain a crazy quilt of different types of information including codes for billing and diagnosis, text notes from physicians and nurses, as well as images and lab reports.

To create order out of these chaotic data sets, Cato is doing a type of detective work known as phenotyping, which involves looking for the digital footprints that HAIs can leave behind in the medical records. Cato is looking specifically at urinary tract infections (UTIs) and surgical site infections, two of the most common preventable complications in hospitals. EMRs often mask infections because clinicians don’t note infections in patients’ charts in a structured format that’s easy for computers to read, Cato says. With UTIs, for example, evidence of the infection might be found using a combination of billing codes, procedure codes, and medications noted in the chart. There may also be certain phrases written in physician or nursing notes. “Right now, if you want to identify which patients had an infection, the gold standard is a clinical review of the chart,” Cato says. But the chart is organized to facilitate medical billing, not infection surveillance.  “There are no automated standards for getting this information, and that’s what I’m setting out to develop.”

Cato’s work will unfold in two stages. First comes what’s known as a curated analysis, where Cato tells the computer what specific billing codes, procedures, medications, or phrases to search for in the records.  Then the machines take over. The second stage involves an automated analysis, where a computer looks at patterns in the EMRs and determines what variables might be associated with an infection. With a UTI, for example, there might be a pattern of certain antibiotics prescribed in a certain dose on a certain schedule that would turn out to leave a digital footprint of the infection. The automated analysis deploys advanced mathematical techniques to establish the relationship between different pieces of information in the EMR. “The beauty of this, is the algorithms can find relationships that might be overlooked when you curate specific searches yourself,” Cato says. “This is the same kind of mathematics Amazon and Google use; it’s very sophisticated.”

Part of his work will focus on nursing notes, an often overlooked part of the patient chart. While both physicians and nurses record notes, the information from nurses is typically kept separately from the main patient chart. Sometimes other nurses will read the nursing notes, but they often don’t because nurses do a verbal handoff, Cato says. Residents or physicians may refer to nursing notes if there’s a complication, but this also isn’t routine. Cato worked on another research project at Columbia Nursing that found a direct correlation between the frequency of nursing notes and the condition of the patient, with more notes occurring as the patient gets sicker. The automated analysis he’s doing now may uncover new information about HAIs in the nursing notes.

Elaine Larson, PhD, RN, FAAN, associate dean for research, is principal investigator of a newly-funded Research Supplement to Promote Diversity in Health-Related Research awarded to her R01, “Health Information Technology to Reduce Healthcare-Associated Infections: HIT-HAI,”  which will support Cato’s work.  Under this Diversity Research Supplement and with mentorship by his PhD research adviser, Suzanne Bakken, PhD, RN, FAAN, FACMI, Cato will take the lead on the scholarly work needed to advance the data science in HIT-HAI in two areas: data governance and phenotype algorithms.   This $220,000 supplement is funded by the National Institute of Nursing Research of the National Institutes of Health for 18 months.