In house data model
Introduction
Stanford has been supplying clinical data for research and other secondary uses to the Stanford Medicine community since 2008, long before standard data models became popularized. Stanford’s original research repository, STRIDE, was designed based on the simple forms of clinical data available at the time, namely, clinical notes and reports, lab orders and results, medication orders and administration records, and of course patient identifiers and demographics.
The in-house data model is derived from Epic’s reporting database Clarity using the process diagrammed below. First, the dataset is filtered to remove person and encounter data that due to legal/contractual obligations may not be used for secondary purposes. Then, selected clinical data elements are extracted and transformed from each of the two source systems, and finally merged into a shared data model. The two systems share a single service for issuing and managing medical record numbers, so data for patients seen at both hospitals is aggregated into a single unified patient record. This data is then made available to researchers by the STARR Tools applications.
The data model has evolved over the years in response to research needs. Today, the in-house data model contains minimally modified clinical data, suitably filtered for compliance and de-identified as needed, but otherwise faithfully represents the information exactly as captured in the original clinical system.