Precision Oncology Core Data Model v1.1.0
Precision-DM v1.1.0 contains 16 profiles (shown on the navigation pane) and 19 subprofiles. Several profiles could be represented as repeatable in a database like REDCap or as part of a one-to-many relationship with the parent tables in a relational database.
Precision-DM profiles contain 355 elements of the Boolean, CodeableConcept, Date, Decimal, Attachment, Integer, Percentage, Period, and String data types. The Attachment type is assigned to elements linked to any molecular testing reports (e.g., germline, tumor profiling, next-generation sequencing) or the MTB report.
Approximately 61% of the data elements are mapped to FHIR, and 39% receive values from selected terminologies or code sets defined by the JHU Team. Fifteen terminologies and coding nomenclatures are included in Precision-DM and are shown in the table below. We have also specified whether the data elements are “Required” or “Required if known” based on the JHU’s MTB minimum information requirements for a thorough review.
Terminologies Used
Terminology | Description | Profile(s) Using It |
---|---|---|
COSMIC | The Catalogue Of Somatic Mutations In Cancer (COSMIC) that comprises the COSMIC database and the Cell Lines Project Source: Sanger Institute |
Somatic Genomics |
ECOG Performance Status Score and Scale | The ECOG Performance Status Scale was developed by the Eastern Cooperative Oncology Group (ECOG), now the ECOG-ACRIN Cancer Research Group, and published in 1982 Sources: Oken et al. Am J Clin Oncol 1982; ECOG-ACRIN Cancer Research Group |
Patient Information |
Ensembl | Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation, and transcriptional regulation. Ensembl annotates genes, computes multiple alignments, predicts regulatory function, and collects disease data. Ensembl tools include BLAST, BLAT, BioMart, and the Variant Effect Predictor (VEP) for all supported species Sources: Howe et al., Nucleic Acids Res. 2021; Ensembl Project |
Germline Profiling Data, Germline Genomics, Somatic Genomics |
FHIR (not a terminology, using value sets listed in certain Resources) | The Fast Healthcare Interoperability Resources (FHIR) is a standards framework created by HL7. FHIR solutions are built from a set of modular components called “Resources”. These Resources can easily be assembled into working systems that solve real-world clinical and administrative problems at a fraction of the price of existing alternatives. They contain multiple elements and, often, use specific value sets to complete them. Two of these value sets are proposed for inclusion in the Precision-DM Source: HL7 FHIR |
Patient Information, Disease Diagnosis |
HGNC | The HUGO Gene Nomenclature Committee (HGNC) is responsible for approving unique symbols and names for human loci, including protein-coding genes, ncRNA genes, and pseudogenes, to allow unambiguous scientific communication Source: HGNC |
Germline Profiling Data, Germline Genomics, Somatic Genomics |
HGVS | The Human Genome Variation Society (HGVS) nomenclature describes sequence variants Source: HGVS Society |
Germline Profiling Data, Germline Genomics, Somatic Genomics |
Karnofsky Performance Status Score and Scale | The Karnofsky Performance Scale Index allows patients to be classified as to their functional impairment. This can be used to compare the effectiveness of different therapies and to assess the prognosis in individual patients. The lower the Karnofsky score, the worse the survival for most serious illnesses Sources: Karnofsky et al., Cancer 1948; Schag et al, JCO 1984; National Palliative Care Research Center |
Patient Information |
LOINC | The Logical Observation Identifiers Names and Codes (LOINC) is an international standard for identifying health measurements, observations, and documents. A limited set of codes is used in three of the Precision-DM data elements Source: Regenstrief Institute |
Germline Profiling Data, Germline Genomics, Somatic Genomics |
NCBI ClinVar | Value set of human genetic variants, drawn from ClinVar. The codes in this value set refer to the ClinVar Variation ID, or the identifier for the variant or set of variants that were interpreted. Source: NCBI ClinVar Data Dictionary |
Germline Profiling Data, Germline Genomics, Somatic Genomics |
NCBI dbSNP | The Single Nucleotide Polymorphism Database is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information in collaboration with the National Human Genome Research Institute Source: Wheeler et al, Nucleic Acids Research 2007 |
Germline Profiling Data, Germline Genomics, Somatic Genomics |
NCBI RefSeq | The Reference Sequence (RefSeq) database provides curated references for transcripts, proteins, and genomic regions, plus computationally derived nucleotide sequences and proteins Source: Wheeler et al, Nucleic Acids Research 2007 |
Germline Profiling Data, Germline Genomics, Somatic Genomics |
NCIt | NCI Thesaurus (NCIt) provides reference terminology for many NCI and other systems. It covers vocabulary for clinical care, translational and basic research, and public information and administrative activities. NCIt is recommended for multiple fields across four profiles in the Precision-DM Source: National Cancer Institute |
Patient Information, Personal and Family History, Disease Diagnosis, Tumor Sample Information |
RxNorm | RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, and Gold Standard Drug Database. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. RxNorm is proposed for the coding of medication names in the Precision-DM Source: National Library of Medicine |
Treatment Adjuvant, Treatment Neoadjuvant, Treatment Metastatic |
SNOMED CT | The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) supports the development of comprehensive, high-quality clinical content in electronic health records. It provides a standardized way to represent clinical phrases captured by the clinician and enables automatic interpretation of these. SNOMED CT is one of a suite of designated standards for use in U.S. Federal Government systems for the electronic exchange of clinical health information and is also a required standard in interoperability specifications of the U.S. Healthcare Information Technology Standards Panel. SNOMED CT is the proposed terminology for multiple data elements in the Precision-DM Sources: SNOMED; National Library of Medicine |
Patient Information, Treatment Adjuvant, Treatment Neoadjuvant, Treatment Metastatic, Treatment Surgery |
TNM | The TNM Classification of Malignant Tumours (TNM) is a globally recognized standard for classifying the extent of the spread of cancer. The classification of cancer by anatomic disease extent, i.e., stage, is the major determinant of appropriate treatment and prognosis. TNM is used in the Precision-DM for cancer staging purposes Source: The Union for International Cancer Control |
Disease Diagnosis |