Skip to main content Skip to secondary navigation

STARR-OMOP is Stanford Electronic Health Record data from its two Hospitals in a Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Use OMOP for observational science, population health science, collaborative network studies and reproducible data science.


Main content start


There are a number of popular CDMs to choose from including i2b2, Pediatric Learning Healthcare System PEDSNet, Patient-Centered Clinical Research Network PCORNet, Health Care Systems Research Network, and the US Food and Drug Administration Sentinel. Choosing a particular CDM over another is a matter of meeting specific research objectives. It is not uncommon for an academic medical center to support more than one.

Our second generation research clinical data warehouse (r-CDW) needs to support a large number of use cases. For this r-CDW, we choose OMOP CDM. OMOP CDM demonstrates applicability for many different use cases including a) claims and EHR (link), b) EHR based longitudinal registries (link) and, c) Hospital transactional database (link).  The OMOP CDM demonstrates strong results in comparative effectiveness research (link) with minimal information loss during data transformation (link), speeds up implementation of clinical phenotypes across networks (link), and promotes research reproducibility (link). There is demonstrated interoperability between different CDMs (link) so choosing OMOP does not exclude support for other CDMs in future. Furthermore, there is a strong focus in OHDSI community on data quality and broad support for the analytical toolkits (aka methods library) that together strive to deliver consistency in cohort definition, analysis design, and reporting of results. Perhaps the most appealing aspect is that OHDSI is an open source public-private partnership and welcomes community participation. There is a robust community of end users, developers and thought leaders who are actively engaged in various shared repositories, discussion forums, training and workshops. The collection of learning resources are vast (link) and includes FAQs, code snippets and video lectures. Finally, OMOP is adopted at other CTSA sites e.g., Albert Einstein College of Medicine – Montefiore Health, Columbia University, Icahn School of Medicine at Mt. Sinai.

OMOP artifacts

The OMOP database in STARR portfolio is derived from the two Epic Clarity EHRs. 


OMOP pipeline

Research IT receives raw Clarity from each of the two hospitals and builds a filtered Clarity (learn more) for each. These filtered Clarity databases then become the source for OMOP ETLs. Patients at the two hospitals are linked via their MRN. We first build a OMOP database with all PHI present. Then, we build two de-identified databases that are accessible to Stanford researchers without an IRB, STARR-OMOP-deid and STARR-OMOP-deid-lite. The former contains de-identified clinical text (NOTES) and text mining derived annotations (NOTE_NLP). The OHDSI Cohort analysis tool, ATLAS, runs on STARR-OMOP-deid-lite. We have recently launched ACE, a new cohort tool - stay tuned for more information about ACE.

STARR-OMOP-deid is refreshed monthly and  STARR-OMOP-deid-lite is refreshed weekly. The patient identifiers stay stable between refreshes. The STARR-OMOP-deid(-lite) are accessible as self service. For the OMOP PHI or any other variation like Limited Data Set or linked with Clarity or sharing with non-Stanford researchers, please request a consultation service

Here are some useful OMOP resources by Research IT:

  • STARR-OMOP data dictionary: In addition to tables required for OMOP CDM 5.3.1, the dataset contains some extra columns which are not strictly part of the CDM definition, but have been added for increased source/patient traceability. The STARR OMOP identified dataset is created using Clarity tables which only include the patient and encounter data that is permissible for research. This deid dataset does not contain psychiatric notes, or other confidential notes. This g-sheet is publicly accessible.
  • STARR-OMOP Technical Specifications document: This document provides details regarding the underlying STARR-OMOP data, transformations, quality metrics and techniques such as de-identification. This g-doc is accessible with SUNetID.
  • 4 Tutorials on our Stanford Starr YouTube channel that will provide hands-on help to query and use our OMOP.

Learn more about Stanford OMOP