Electronic Health Record
EHR Overview
STARR has access to electronic health record (EHR) data from the two hospitals, the Adult Hospital (aka Stanford Health Care) and Children’s Hospital (aka Lucile Packard Children’s Hospital). The two hospitals use Epic for patient care and share Epic Clarity data with SoM. Epic Clarity contains data from the clinics that are part of the University Healthcare Alliance and Lucile Packard Healthcare Alliance.
Clinicians at the hospitals interact with the product Epic Hyperspace. Epic Hyperspace is not a clinical module in itself, but rather the application client that is presented to users of most areas of Epic. When a nurse, doctor, therapist, or administrative staff launch Epic, the front-end software that is presented to them is called Hyperspace. Epic Hyperspace is configured to display different menus, tasks, and options to users depending on their specific roles. Chronicles is the main database that runs much of the Epic software. Chronicles is a non-relational database that sends data to Clarity, which is a relational database, and is used for advanced reporting. Much of the data that is stored in the Chronicles database gets copied over to Clarity, which is a relational database that allows analysts to create more detailed and complex reports.
The two hospitals have different Epic ecosystems. For example, LPCH has a MSSQL Clarity and SHC has an Oracle Clarity. LPCH has an advanced implementation of Caboodle and SHC has an early implementation of Caboodle. Caboodle has a subset of Clarity data. Stanford researchers do not have access to Caboodle for research purpose at this time. SHC takes the operational Clarity data and converts it to an operational Enterprise Data Warehouse (EDW) with a number of different subject marts. The two Epic Clarity databases have the most usable research data.
Differences between Epic Hyperspace and Clarity
STARR gets its EHR data from the two Epic Clarity at the two hospitals. There are some data in Epic Hyperspace that is not present in Clarity.
- Clinical text formatting is not preserved in operational Epic Clarity. This poses two problems. a) During chart review, it makes the chart hard to read. b) For Natural Language Processing, if using a pre-trained model, these models do not perform as well on notes because of the loss of formatting.
- Not all data that is visible in Hyperspace exists in Epic Chronicles. Data from disparate sources can be viewed in the Hyperspace application, but the raw data exists in other systems.
- Researchers need to be aware of the limitations of Clarity if they are interested in specific cohorts. For example, for researchers interested in pediatric research, note that the Pediatric Hospital contains data for adults as well as pediatric population. Often the LPCH patients are tracked well beyond 18 years. Mothers are also tracked in pediatric Clarity. For a number of pediatric patients, parts of the data may be present in SHC Clarity only. For example, Emergency Department (ED) encounters for LPCH patients are captured in SHC Clarity, pediatric patients in Stanford owned rural clinics submit data to SHC, labs done at SHC including radiology and pathology are present only in SHC Clarity.
STARR Clarity
SoM gets a copy of both Hospital Clarity databases. The two hospitals use different database technologies, LPCH has a MSSQL Clarity and SHC has an Oracle Clarity. The two hospitals have two different mechanisms of sending data to SoM. The following figure shows SHC Clarity transfer process.
The above figure shows our workflow where the data from SHC Epic makes its way to STARR. Every night, a primary Clarity database (aka operational Clarity) is generated from Epic Chronicle (its shadow copy). The primary Clarity is replicated to a disaster recovery (DR) node, in near real time. We use Oracle Active Data Guard license to make the DR readable (more). We extract the tables to a compressed format called AVRO, see our methods section, "Moving large databases" for details. The AVRO is uploaded to our cloud data center, where the STARR raw Clarity is generated as a BigQuery dataset. The pediatric Clarity (MSSQL database) workflow is similar in concept but different in technical implementation.
For the SHC Clarity, it takes <12 hrs to extract and push all Clarity data (~3 million patients with clinical text, flowsheets and more) to STARR on a 32 vCPU server with 64 GB RAM. Overall, the STARR raw Clarity is no more than 12 hrs behind the SHC operational Clarity.
Hospital operational vs STARR Clarity
In the figure, you will note that some tables are redacted from STARR copy of Clarity. These are related to financial data i.e., cost of service data. Access to the financial requires additional hospital compliance approval. Since 2011, SHC has submitted average charges for 25 common outpatient procedures on California Health and Human services open data portal. The federal price transparency guidelines have resulted in complete price list of procedures and services on SHC website, it includes price of all procedures, and negotiated rates for top 10% payor contracts. However, it is difficult to connect these costs with individual patient data. Specifically, from the list price of procedures (100% data available), to negotiated rate (10% data available), to adjudicated rate (0% data available), it is hard to establish what the healthcare system is getting paid and what the patient is paying out of pocket. SHC also provides a cost calculator, the calculator uses historical data to estimate the cost a patient will pay out of pocket for common exams, procedures, tests, and services.
STARR raw vs filtered Clarity
Once the raw Clarity is delivered to STARR, a number of derivative datasets are created. As first step, the STARR raw Clarity data (which lacks financial data) is filtered, some patients and encounters are redacted, to create a filtered Clarity. These do not represent significant patient numbers and do not impact research outcome. For example,
- Certain high profile individuals are redacted. A small number of high profile patients are redacted. This is because these individuals are identifiable due to the public and popular nature of their profile.
- Not all encounters are usable for research. As per adult hospital contractual obligations, certain encounters are excluded from research. These encounters do not have any medically relevant information and result from SHC acting as an insurance provider. These individuals have an MRN but are not provided any hospital services.
- Occupational health service data is also redacted from research use. These are data for our hospital staff who are not our patients but are required to take certain tests and procedures as part of doing their day job e.g. COVID vaccination. These are treated as employee health data. Under no circumstance are employee health data eligible for research. If the staff is also a hospital patient, then their data from routine healthcare is eligible for research.
Finally, participant recruitment for clinical trials has further restrictions to accessible patient populations. Certain patients and their encounters are not eligible for recruitment engagement. These details are accessible to Stanford Research Participation team via the Stanford Participant Engagement Platform (PEP). Please request a participant engagement consultation for further details.
Generating r-CDWs
The data is transformed to r-CDWs via a series of ETLs. We support a number of different r-CDWs that use different data models. Understanding the ETLs require understanding of the source data model (Epic Clarity), clinical workflows and sink data model (r-CDWs like OMOP). Research IT has developed these ETLs in close collaboration with our hospital partners, Stanford labs and consortiums. Researchers can request access to the ETLs and other documentation and these can be provided on a case-by-case basis. Some of the r-CDWs are accessible via self-service and others are accessible via consultation request.
Note that confidential notes are not easily accessible for research. These may be mental health or similarly sensitive notes. Researchers must request these via their IRB procedure and access via a consultation request.
Care Everywhere data is available in Clarity but is not part of the r-CDWs at this time. Care Everywhere data is available when the patient visits us from another Epic site. Where available, the data is brought into Chronicles as part of routine clinical care and eventually makes its way to specific Clarity tables. These tables are currently not processed by the r-CDW ETLs. We estimate that ~25% of our patients have Care Everywhere data and bulk of this data is from a single provider, Sutter Health.