Skip to main content Skip to secondary navigation


Main content start

STARR User Group Meeting (UGM) 2023

May 2023: Research Technology organized the 3rd virtual STARR Users Group Meeting (UGM) on May 10-11, 2023. The two half days featured success stories from the users of STARR, both faculty as well as students. Over 70 unique participants attended the two sessions. We are grateful to all attendees, speakers and organizers.  

  • May 10, 20239:30-10:00 am: Opening remarks
  • May 10, 202310:15-12:15 pm: Using STARR to Enhance Research 
    • Monica Granucci (Cancer Clinical Trials office)Distinguishing Seed From Soil: A MetNet Group's Use of STARR
    • Ethan Steinberg (Computer Science Graduate Program)Self-Supervised Time-to-Event Modeling with Structured Medical Records
    • David Wu (Department of Computer Science): Discovering Monogenic Patients with a Confirmed Molecular Diagnosis in Millions of Clinical Notes
    • Dr. Alex Tarlochan Singh Sandhu, MD, MS (Cardiovascular Medicine): Patient-Reported Outcomes In Heart Failure: Efficient Recruitment in a Low-Cost Pragmatic Trial 
    • Dr. Anoop Rao, MD, MS (Pediatrics - Neonatology) STARR-ing into the Future of Neonatal Research
    • Dr. Anne Taylor, MD and Dr. Jeffrey Yang (Pediatric Cardiology Fellows)The multiple uses of bedside telemetry and vital monitoring systems to improve patient care and safety including detection of ventricular arrhythmias after transcatheter pulmonary valve placement, improvement of alarm fatigue, and monitoring of heart rate variability. 
    • Dr. Jennifer Bollyky, MD (Primary Care and Population Health, Infectious Diseases)Using real-time alerts for COVID study recruitment
    • Dr. Vafi Salmasi, MD (Anesthesia, Adult Pain)Leveraging Learning Healthcare Systems to Integrate Clinical Care and Research
  • May 11, 20231:30 - 1:50 pm: Overall Accomplishments
  • May 11, 20231:50 - 3:00 pm: Stanford Participation in the Larger Consortium
    • Dr. Alison Callahan, PhD and Dr. Stephanie Leonard, PhD (Center for Biomedical Informatics Research and Obstetrics and Gynecology)Leading the OHDSI Perinatal & Reproductive Health Group: Experiences and Accomplishments
    • Dr. Behzad Naderalvojoud, PhD (Center for Biomedical Informatics Research): LauNCHeR: Launching Network studies for Collaborative Healthcare Research
    • Dr. Keith Morse, MD (Peds/Hospital Medicine)PEDSnet: Multi-Site EHR Research at Stanford
    • Dr. Sophia Wang, MD, MS (Ophthalmology)Enabling Bigger Data in Ophthalmology: Standardizing Eye Data Through the OHDSI Eye Care and Vision Research Working Group
  • May 11, 20233:00 - 5:00 pm: Enabling Innovation Using STARR
    • Dr. Shreya Shah, MD (Stanford Healthcare Applied Research Team) and Codex healthRisk and Care Gaps: An AI Approach for Primary Care
    • Katherine Connors, MPH (Participant Engagement Platform)Supporting Research Participant Engagement using STARR
    • Dr. Jason Fries, PhD (Center for Biomedical Informatics Research)LUMIA: A Generative Language Model for EHR Text and Codes
    • Mike Van Ness (Management Science and Engineering Graduate Program)Using STARR OMOP and Atlas for Interpretable Heart Failure Prediction
    • Dr. Daniel Tawfik, MD (Peds/Critical Care)Using audit logs and STARR OMOP to predict physician burnout
    • Dr. Nima Aghaeepour, PhD (Anesthesia): A New Taxonomy for Prematurity
    • Dr. Christian Rose, MD (Emergency Medicine)Context is Key: Unlocking new measures from EHR Audit Log Data
    • Louis Blankemeier (The EE Graduate Program)Simple Framework for Extracting Diagnoses from Clinical Notes 

STARR Workshops, Spring 2023

April/May 2023: Research Technology brings the following workshops this spring. Click the links to sign-up! We will link the videos in due time.

  • STARR-Tools: April 18th, 10-12am: Learn more about our self-service tool with Cohort and Chart review capabilities. The workshop is led by Research Technology team members Dr. Susan Weber, Dr. Joe Pallas, and Yelena Nazarenko.
  • STARR-ACE: April 26th, 3-5pm: Learn more about our Advanced Cohort Engine. This workshop is led by Dr. Alison Callahan (BMIR).
  • STARR-BQ: May 4th, 1-3 pm: Learn more about programmatic access to our pre-IRB OMOP datasets. This workshop is led by Research Technology team member Natasha Flowers.
  • STARR-ATLAS: May 16th, 10am -2pm: Learn more about using Stanford instance of OHDSI ATLAS on STARR-OMOP dataset. This workshop is led by Dr. Behzad Naderalvojoud (BMIR), Kristin Kostka (Northeastern University), and Adam Black (Odysseus Data Services Inc).

Learn more about these products

Insights to Inspire 2022, CLIC

Jun 2022: STARR-OMOP participates in Stanford CTSA's reporting. Center for Leading Innovation and Collaboration (CLIC) showcased STARR-OMOP in its Data Quality forum. Priya Desai presented a talk at CLIC Data Quality Webinar. You can watch the webinar on our StanfordSTARR youTube channel.

STARR Workshops, Fall 2022

Nov 2022: Research IT organized the following workshops in Fall 2022. Click the links to access the video recording.

  • STARR-Tools (Link): Learn to use the self-service STARR Cohort and Chart review tools
  • STARR-ACE (Link): Learn to use the Advanced Cohort Engine
  • STARR-BQ (Link): Learn to use the pre-IRB OMOP Big Query databases using programmatic approaches
  • STARR-ATLAS (Link): Learn to use the Stanford OHDSI ATLAS Cohort Analysis Tool for cohort analysis, and network studies. 

Learn more about these products

STARR Summit 2022

May 12, 2022: Second STARR summit was organized by Priya Desai, R&D Manager Biomedical Informatics, Research IT. We were delighted that over 175 unique participants joined us. We thank all the participants, speakers and organizers for making this event a tremendous success.

  • 8:30-9:15 am: Overall accomplishments
  • 9:15-10:00 am: Enabling Innovation Using STARR (video
  • 10:15-11:50 am: AI in Healthcare (video)
    • Jonathan Lu (MedScholars Program)Auditing Clinical AI Models for Advance Care Planning using STARR-OMOP
    • Louis Blankemeier (The EE Graduate Program)Progress and Opportunities in Opportunistic Computed Tomography:
    • Nima Aghaeepour, PhD  : An AI-driven taxonomy for prematurity
    • Conor Corbin (The BMI Graduate Program)Personalized Antibiograms for Machine Learning Driven Antibiotic Selection
    • Dr Elsie Gyang Ross, MD (Vascular Surgery)Part II: The Saga of Ross Lab, STARR OMOP and Genetic Analyses for PAD detection
    • Minh Nguyen (The BMI Graduate Program)Machine learning models for triage using EHR data
    • Dr. Evan Minty, MD (Center for Biomedical Informatics Research)OMOP, OHDSI, and the arc of scientific progress
    • Juan Manuel Zambrano Chaves (The BMI Graduate Program)Automated Body Computed Tomography Protocoling
    • Behzad Naderalvojoud, PhD (Center for Biomedical Informatics Research)Using Machine Learning to Predict Postsurgical Pain Outcomes on STARR-OMOP
  • 1:00- 5:00 pm: Deep Dive Workshops 
    •  ATLAS and Reproducible Clinical Data Science:  led by Kristin Kostka (Roux Institute), Adam Black (Odysseus Inc), and Dr. Asieh Golazar (Odysseus Inc): A key barrier to achieving reproducibility in observational studies is that often, neither the data nor the analysis are made publicly available to enable different researchers to ask the identical question and verify that they produce identical results. Currently, the prevailing evidence dissemination strategy for observational research is peer-review publications, which provide free text description of methods, maybe with some supplemental materials. However, they are often constrained by word counts and inconsistent in the reporting of key analytic details. In this session, we will discuss the science of reproducible, repeatable research and lead you through a hands-on exercise using the OHDSI Reproducibility Service. The workshop will aim to reproduce a selected study, including the populations (exposures and outcomes) as well as quantify the heterogeneity of interpretations that qualified researchers may produce when attempting to reproduce research.  We will use ATLAS which is a web based application running on the STARR-OMOP database to support the design and execution of observational analyses including vocabulary search and navigation, cohort creation and characterization, calculation of incidence rates, patient level prediction and population level estimation.
    • STARR Cohort and Chart Review Tools led by Dr. Susan Weber (RIT-TDS) and Mina Liu (RIC): The STARR Cohort Discovery Tool is a web application running on the STARR database that lets you approximately count the number of patients at Stanford matching your research study’s inclusion criteria. The STARR Chart Review Tool is a web application on STARR that lets you review each patient’s chart. And when your study data needs exceed the capacity of the self service tools, the Research Informatics Center offers custom data services in exchange for salary support. This workshop will intersperse short presentations with workshop sessions covering the following topics: the clinical research data ecosystem at Stanford, cohort building using the Cohort Discovery Tool, saving your cohort for review and the associated compliance requirements,  rapid and effective chart review, data limitations of the self service tools, and current and future options for custom data delivery services. (video)
    • Using the Advanced Cohort Engine led by Dr. Alison Callahan (BMIR) and Vlad Polony (Atropos Health): Advanced Cohort Engine (ACE) is a tool that combines a unique query language for time-aware search of longitudinal patient records with a patient object datastore for rapid electronic phenotyping, cohort extraction, and exploratory data analyses. ACE accepts data in the OMOP Common Data Model, and is configurable to balance performance with compute cost. ACE’s temporal query language supports automatic query expansion using clinical knowledge graphs. This session will lead you through a hands-on exercise integrating electronic phenotype development with cohort-building to enable a variety of high-value uses for a learning health system. (video)

Linking multi-modal data in STARR

Oct 30, 2021: Priya Desai, MS, presented a poster to showcase STARR linking of multi-modal data at the OHDSI 2021 symposium. 

Abstract: STAnford medicine Research data Repository or STARR, is a research ecosystem that contains a collection of linked research ready data warehouses from disparate clinical ancillary systems and a secure data science facility. The ecosystem is designed on the principles of Data Commons and contains reusable data processing pipelines, cohort and analysis tools, training, user support and much more. STARR data currently includes electronic medical records data, clinical images (radiology, cardiology) and text, bedside monitoring data, and near real time HL7 messages. Processed, “analysis ready” linked data is available for to all Stanford researchers in a “self-service” mode and currently consists of:

  • De-identified Electronic Health Records (EHR) from the two Stanford hospitals and clinics in the OMOP Common Data Model (CDM). 
  • De-identified bedside Monitoring (Waveform) data from Stanford Children’s Hospital

Linked patient data in the ecosystem are primarily anchored using person_id, the auto generated identifier for the patient in the CDM from the OHDSI community. When the data is refreshed, the person_id stays stable. Other data such as imaging metadata from radiology (including MRI’s, X Rays, ultrasounds and CT scans), and cardiology are coming soon. These analysis-ready datasets reside in BigQuery, a cloud based data warehouse that leverages the infrastructure of the Google Cloud Platform and offers rapid SQL queries and interactive analysis of massive datasets.

Link to summary and poster

ATLAS with a BigQuery backend running Execution Engine – a Software demo

Oct 30, 2021: Jose Posada, Ph. D., presented a demo on Stanford ATLAS at the OHDSI 2021 symposium.

Abstract: Stanford has adopted an ecosystem view  of the modern clinical research tools. Built on the foundation of STRIDE, the ecosystem has since expanded to STAnford medicine Research data Repository (STARR) ecosystem. The overall design principles of the ecosystem are based on Data Commons and includes compute and storage infrastructure, data lake, data warehouses, data processing pipelines, APIs, tools, user training, and support. Our overarching goal is to streamline science for researchers.

Backbone of the STARR ecosystem is STARR-OMOP, an analytical clinical data warehouse that uses OMOP Common Data Model. One of the reasons for Stanford to choose OMOP was OHDSI in its entirety, not just the data model, we wanted the tools, the network, the community. Another critical part of our ecosystem is our data center. The compute and storage infrastructure has grown from on-premise data center to embrace cloud, not just for its larger storage and compute capacity, but also for specialized solutions. One such specialized solution is Google BigQuery, a managed distributed data warehousing solution. Stanford had previously implemented Google Cloud BigQuery for a Big Data genetics initiative, so it was natural to try BigQuery for STARR-OMOP. BigQuery brings two very significant features, one is the fact that it is a managed service and unlike traditional databases, it doesn’t require DBA tinkering for performance. It is performance out-of-the-box. The data engineering team can focus on data standardization, completeness and quality instead of indexing, sharding, and scaling. The second big feature is the data science friendly APIs. Researchers can use their laptops or HPC environments to use their Jupyter Notebooks and never really get out of the tools they do data science with.

In a previously published manuscript, we show that ATLAS benchmarking suite using SynPUF runs 3 to 10x faster on BigQuery when compared to PostgreSQL (Manuscript, Supplementary Table S9.3). We also show that Achilles queries run in ATLAS using STARR-OMOP data present near real time user experience. Out of 725 total queries available in Achilles, 660 queries took less than 17 seconds, and median execution time was 3 sec (Manuscript, Supplementary Table S9.1). While direct or API based SQL query using BigQuery is highly performant, the OHDSI toolkits do not directly use BigQuery. Instead, the tools use shared libraries such as DatabaseConnector, and SQLRenderer that translate the query to BigQuery SQL dialect. Optimization of the OHDSI toolkits to run on BigQuery is a journey we embarked on nearly two years ago. This journey has since led to successful deployment and utilization of ATLAS at Stanford. We have also embraced the execution of ATLAS PLE and PLP analyses through ARACHNE Execution Engine. The engine allows us to fully execute estimation and prediction studies right inside ATLAS. This presentation will demonstrate Stanford ATLAS running on top of STARR-OMOP including the ARACHNE Execution Engine.

Link to summary and demo

STARR Summit 2021

April 22, 2021: First STARR summit was organized by Priya Desai, R&D Manager Biomedical Informatics, Research IT, and Nigam Shah, Professor of Biomedical Informatics, and Data Science. We were delighted that over 175 unique participants joined us to listen to keynote, STARR wins, and the live panel, and met with Stanford service providers, participated in workshops and much more. We thank all the participants, speakers and organizers for making this event a tremendous success. Following recordings are available from the summit.

  1. Keynote, George Hripscak, Chair and Vivian Beaumont Allen Professor of Biomedical Informatics, Columbia University (video)
  2. Celebrating Wins, Priya Desai, Manager of Biomedical Informatics R&D (video)
  3. Panel, Birju Patel, Stanford Fellow and panel moderator - Successes, Challenges, and Barriers  (video)
  4. Workshops (video)

Here is the summary of all the sessions:

  • Morning session (10-12 noon) 
    • 10:00-10:15 am - Welcome+Introduction - Michael Halaas & Dr. Nigam Shah  
    • 10:15-10:45 am - Keynote - Dr. George Hripcsak      
    • 10:45-11:15 am - Overall accomplishments 
    • 11:30-12:00 noon - Lightning talks from various labs
      • Dr Amelia Sattler, MD (Primary Care and Population Health)From Code to Bedside: Using Quality Improvement Methodology to Implement Big Data Solutions to Solve Clinical Problems
      • Stephanie Leonard, PhD, MS (Maternal-Fetal Medicine)Improving understanding of pregnancy complications with STARR-OMOP data
      • Stephen Pfohl (BMIR)Building and evaluating Predictive models with STARR-OMOP
      • Alison Callahan, PhD & Jose Posada, PhD (BMIR)Supporting COVID-19 patient management with data: Standing up a clinical data science team and getting answers in two weeks! 
      • Sutanay Choudhury (Stanford Center for Population Health Science)Discovering Higher-Order relationships from Multi-Modal EHR Data 
  • Afternoon session (1-3 noon)
    • 1:00-2:00 pm - Panel with Stanford SoM faculty - Linking multimodal clinical data - Successes, Challenges, and Barriers" moderated by Dr. Birju Patel. The panel consisted of Dr. Elsie Gyang RossDr. Matt LungrenDr. David ScheinkerDr. Tom Montine, and Dr. Tina Boussard 
    • 2:00-2:45 pm - Research Infrastructure and Services (Parallel Break out Rooms)
      • Research Informatics Center (RIC)
      • Stanford Research Computing Center (SRCC)
      • Stanford REDCap (REDCap)
      • CHOIR Learning Health Platform (CHOIR) 
      • SCH Clinical Research Informatics at Maternal and Child Health Research Institute (MCHRI)
      • Stanford Center for Population Health Sciences (PHS)
  • Workshops (3:00-4:30 pm)
    • Participating in Network Studies - Jose Posada (Research IT, BMIR)In this workshop we will cover what is an OHDSI network study, how to get involved, how you could lead a study, how to participate using STARR-OMOP and an example of our prior participations.
    • ATLAS - Adam Black (Odysseus)This workshop will showcase some of the analytic functions made available by the Atlas tool. We will use Atlas to explore a realistic question about the Stanford STARR database that highlights cohort building, characterization, and clinical prediction modeling.
    • Imaging - Stephanie Bogdan (Stanford AIMI): This workshop presents an overview of the AIMI Center and available resources and systems for researchers at Stanford.

TiDE de-identification at OHDSI 2020 Symposium

Oct 30, 2020: Jose Posada, Ph. D., Sr. Clinical Data Scientist presented on Research IT's clinical text de-identification method, TiDE at the symposium collaborator showcase. De-identified clinical text data is an essential need in modern clinical informatics research. The cloud-based TiDE produces high-quality and cost efficient de-identified clinical text. His talk was one of the 12 lightning talks on data standards and methods research. There were more than 100 presentations of OHDSI research and collaboration at this year’s collaborator showcase. The TiDE pipeline is part of Stanford's STARR-OMOP portfolio.

Abstract: TiDE combines a mix of pattern matching techniques and machine learning-based named entity recognition to find protected health information as well as techniques such as Hiding in Plain Sight as an additional privacy enhancement strategy. TiDE is built from easily accessible best-in-class methods deployed in cloud architecture, and is computationally resource-intensive yet cost efficient. TiDE can process approximately 100 million clinical notes in roughly ~7hr  by deploying 800 dataflow workers in parallel at a total cost of $440 USD. The total processing time translates to 0.00025s/note which is 3 orders of magnitude less than the recently reported fastest process (0.24s/note) by Heider et al.

Watch it on YouTube