Help

Reference for every feature in VitalMatch. New here? Try the Quick start first.

Jump to: Glossary Projects Subjects & scans Custom metadata Members & roles Audit log De-identification profiles STAR registry Analytics API tokens Python SDK

Glossary

Terms and abbreviations used throughout the platform — clinical, analytical, and technical.

Transplant clinical terms

ABO: Blood group system. Four types: A, B, AB, O. STAR also tracks subtypes (A1, A2, A1B, A2B). Used for donor–recipient compatibility matching.
DBD — Donation after Brain Death: Donor was declared dead by neurologic criteria; circulation is maintained mechanically until organ procurement. Most U.S. transplants. In STAR: NON_HRT_DON = N.
DCD — Donation after Circulatory Death: Donor was declared dead by cardiopulmonary criteria after withdrawal of life-sustaining treatment. Smaller share of donations historically; growing rapidly. Lungs from DCD donors require careful preservation (warm ischemic time matters). In STAR: NON_HRT_DON = Y (formerly "non-heart-beating donor").
HLA — Human Leukocyte Antigen: Cell-surface proteins that drive immune recognition of "self" vs "non-self". Donor and recipient HLA mismatch is a primary driver of acute rejection. STAR tracks HLA-A, -B, -DR loci.
PRA — Panel Reactive Antibody: Percentage of a standard donor panel against which the recipient already has antibodies. High PRA = sensitized recipient = harder to find a compatible donor. In STAR table: THORACIC_PRA_CROSSMATCH_DATA.
Crossmatch: Pre-transplant test mixing recipient serum with donor cells. A positive crossmatch typically aborts the transplant.
Acute rejection: Immune attack on the transplanted organ within weeks-to-months post-transplant. Often biopsy-graded. Treated with anti-rejection agents (steroid pulse, ATG, etc.). In STAR follow-up: ACUTE_REJ_EPI.
Chronic lung allograft dysfunction (CLAD): Late, progressive decline in lung function — the dominant long-term cause of graft loss in lung transplant. Subtypes: BOS (bronchiolitis obliterans syndrome), RAS (restrictive allograft syndrome).
Primary graft dysfunction (PGD): Acute lung injury within 72 h of transplant — early-period oxygenation impairment + chest-X-ray infiltrate. Major driver of 30-day mortality and predictor of CLAD.
Ischemic time: Cold + warm interval the organ spends without circulation between procurement and reperfusion. Long ischemic times correlate with worse early function.
Lobar lung transplant: Living-donor variant — typically two healthy donors each give one lower lobe to a single recipient. Rare in the U.S. now.

STAR variable cheat sheet

Suffix conventions: _DON = donor, _TRR = transplant recipient registration, blank suffix on a recipient row usually means recipient.

DONOR_ID · TRR_ID_CODE: Donor's unique STAR identifier · recipient registration code (one per (recipient, organ) pair).
WL_ORG: Waitlisted organ. Lung transplant rows: LU (lung) or HL (heart-lung).
TX_DATE · TXED: Date of transplant · whether the recipient was actually transplanted (1 = yes).
PX_STAT · PX_STAT_DATE: Patient status at follow-up (LIVING, DEAD, RETRANSPLANTED, LOST TO FOLLOW UP, NOT SEEN) · date of that status check.
AGE_DON · HGT_CM_DON_CALC · WGT_KG_DON_CALC: Donor age (years) · height (cm) · weight (kg).
GENDER_DON · ETHNICITY_DON: Donor gender · ethnicity (decoded labels in the UI; raw codes in the parquet).
ABO_DON · ABO: Donor blood group · recipient blood group.
COD_CAD_DON: Cause of death of cadaveric donor (anoxia, head trauma, cerebrovascular/stroke, CNS tumor, other).
NON_HRT_DON: Non-heart-beating donor flag. Y = DCD, N = DBD.
HIST_CIG_DON · HIST_ALCOHOL_DON: Donor smoking history · alcohol history (Y/N/U).
HEP_C_ANTI_DON · HBV_CORE_DON · HIV_NAT: Hepatitis C antibody · Hepatitis B core antibody · HIV nucleic acid test.
ACUTE_REJ_EPI: Acute rejection episode (per follow-up): "No", "Yes, at least one episode treated…", "Yes, none treated…".
TRT_REJ · HOSP_REJ · TRT_REJ_NUM: Treated for rejection · hospitalized for rejection · number of rejection treatments.
END_DATE · INIT_DATE · DEATH_DATE: End-of-record date · initial waitlist date · death date.

Imaging & CT terms

DICOM: Digital Imaging and Communications in Medicine — the standard file format for medical images. Each donor scan in our archive is one or more DICOM files (or a ZIP of them).
StudyInstanceUID: Globally-unique identifier for one CT acquisition, written into the DICOM headers at scan time. The bridge between an imaging file and STAR — the crosswalk maps StudyInstanceUID → DONOR_ID.
SeriesInstanceUID: Identifier for one image series within a study (e.g., a thin-slice axial reconstruction). One Study can contain several Series.
HU — Hounsfield Unit: CT density scale. Air = −1000, water = 0, soft tissue ≈ +30, bone > +400. Lung parenchyma is mostly between −950 and −500.
Lung window: Display window centered around lung-parenchyma HU values (typically C = −600, W = 1500). All slice viewer renders use this window.
Aeration (in our metrics): Fraction of segmented lung voxels with HU between −950 and −500 (the "well-aerated" band). Lower aeration → more consolidation, atelectasis, edema, or infiltrate.
Atelectasis · Consolidation · Edema · Emphysema: Atelectasis: collapsed lung (often positional or post-intubation; raises HU). Consolidation: alveoli filled with fluid/pus/blood. Edema: extravascular fluid accumulation. Emphysema: alveolar destruction (lowers HU).
Density symmetry: Mean HU of left vs right hemithorax. A difference under 50 HU is treated as symmetric in our screening pipeline.
Suspicious region: Soft-tissue-density (−100 to +150 HU) lesion ≥ ~6 mm (113 mm³) inside the lung mask. Coarse nodule/mass count, not a diagnostic biomarker.

Statistical & analytical terms

KM — Kaplan–Meier: Non-parametric estimator of the survival function. Step function that drops at each death and accounts for censoring. Used by our Survival, Rejection, and CT-outcomes pages.
HR — Hazard Ratio: Multiplicative effect on the instantaneous risk of an event. HR > 1: the covariate increases risk. HR < 1: protective. HR = 1: no effect.
95% CI — 95% Confidence Interval: Range that contains the true HR with 95% probability under repeated sampling. If the CI crosses 1, the effect is not statistically significant at α=0.05.
Cox PH — Cox Proportional Hazards: Multivariate survival regression. Estimates the HR for each covariate adjusted for all others. Assumes the HR is constant over follow-up time (the "proportional-hazards" assumption — diagnosed by the PH-violation panel).
Log-rank test: Non-parametric test for whether two or more KM curves are drawn from the same distribution. Outputs a chi-square statistic and a p-value. Used on every stratified curve in the platform.
Concordance index (Harrell's C): Probability that, for two random recipients, the model correctly ranks who survives longer. 0.5 = chance. 1.0 = perfect ranking. 0.55–0.65 = modest discrimination, typical for purely tabular Cox in our data.
Censoring (right-censoring): When the event of interest hasn't been observed by the end of follow-up. KM uses these patients up to the censoring date — they contribute to the at-risk denominator without ever counting as an event.
Tertile · Quartile: Splits a continuous variable into 3 / 4 equal-sized groups. The CT-outcomes page uses tertiles by default.
p-value: Probability of observing a test statistic at least as extreme as the one we got, under the null hypothesis. p ≤ 0.05 = conventional significance threshold (green in our pages); 0.05 < p ≤ 0.20 = suggestive (amber); p > 0.20 = no detectable signal at this sample size (gray).
Forest plot: Visual display of effect sizes (HR or similar) with their 95% CIs as horizontal bars, one per covariate. Used by the Cox PH page.
Schoenfeld residuals: Diagnostic statistic for testing the proportional-hazards assumption. If they correlate with time for any covariate, the HR for that covariate isn't really constant over follow-up.

Datasets & organizations

UNOS — United Network for Organ Sharing: Non-profit organization that manages the U.S. transplant network under contract with HRSA.
OPTN — Organ Procurement and Transplantation Network: The federally-mandated network UNOS administers. The OPTN/UNOS data infrastructure is what produces STAR.
STAR — Standard Transplant Analysis and Research: The annual research data release from UNOS. Roughly 25 GB compressed, ~19 lung-relevant tables (THORACIC_DATA, DECEASED_DONOR_DATA, THORACIC_FOLLOWUP_DATA, …).
NLST — National Lung Screening Trial: NIH/NCI screening trial, ~53,000 participants aged 50–74 with 30+ pack-year smoking history. We have ~13,500 NLST chest CTs locally; used as a (caveated) imaging baseline.
UK Biobank: ~500,000-participant UK population cohort with imaging, EHR, genomics. The cleanest publicly-accessible source of "true normal" chest CTs (population-based, not enriched for smokers).

Compliance & technical terms

PHI — Protected Health Information: Identifying information about a patient's health, governed by HIPAA. DICOM headers contain dozens of PHI fields (patient name, MRN, DOB, address, physician names, UIDs, …) — the de-identification pipeline strips or hashes them.
HIPAA Safe Harbor: The 18-element list of PHI identifiers that must be removed for data to be considered de-identified under HIPAA. Default de-id profile in VitalMatch.
RBAC — Role-Based Access Control: Per-project membership controls in v2: reader / editor / admin. See the Members & roles section below.
SDK — Software Development Kit: The pip-installable vitalmatch-sdk Python package. See the Python SDK section below.
Parquet: Columnar file format used to cache STAR tables on disk. Much faster than re-parsing the raw .DAT tab-separated files on every query.
Crosswalk: The DATA0014109_Crosswalk.csv file that maps DICOM StudyInstanceUID values to STAR DONOR_ID. Roughly 9,300 deceased-donor CTs in our archive resolve through this file.

Projects

A project is the top-level container for a research question. Every subject, scan, annotation, custom metadata field, saved cohort, and audit-log entry lives inside exactly one project.

Creating a project

Visit Projects → enter a name + optional description → + New project. The slug (URL identifier) is auto-derived; collisions auto-suffix to -2, -3, etc. The creator becomes the project's first admin.

Project visibility

You only see projects you're a member of (global admins see everything). Other users won't know your project exists unless you add them.

Subjects & scans

Two-tier hierarchy below a project:

Project → Subject → Scan → DICOM file

One Subject typically corresponds to one UNOS donor (or one NLST control). The external_id field holds whatever the caller wants — for v1-imported data it's the original patient_id.

One Scan is one CT acquisition — a directory of DICOM files or, more commonly, an anonymized ZIP archive on disk. The Scan row stores metadata (image_path, study_uid, donor_id, deid_status); the actual DICOM bytes stay on the NAS.

Registering a scan with a subject_external_id auto-creates the Subject if it doesn't exist — useful for the v1 importer and for the SDK.

Custom metadata

Generic key-value store on every object (project, subject, scan, screening, STAR donor reference). Values are typed: string, number, bool, date, json.

Use it for whatever your research project needs — a "reviewed" flag, a risk score, a free-text note, a nested JSON blob from an external system. Edits are immediate; the same key written twice overwrites.

API:

PUT  /api/v2/objects/<type>/<id>/metadata/<key>    {value, value_type}
GET  /api/v2/objects/<type>/<id>/metadata           full bag for one object
DELETE /api/v2/objects/<type>/<id>/metadata/<key>

Members & roles

Per-project membership controls who can do what. Three roles:

Role	Can
reader	View everything in the project — read-only.
editor	Reader + create/edit subjects, scans, metadata, annotations, saved cohorts.
admin	Editor + manage members, change project settings, view the audit log, manage de-id profile, redo de-identification.

Global admins (configured outside the UI) bypass project membership for emergency access — but they still appear in the audit log.

Safety rails: you can't remove or demote the last admin of a project (would orphan it).

UI: Project header → Members tab.

Audit log

Append-only record of every state-changing action. Failed access attempts are recorded too, with the rejection reason in detail_json.

Rows persist forever (tamper-evidence). The user_label column is denormalized at write time so log entries survive later renames or deletions of the user.

Filters available: user, action (e.g. scan.create), action prefix (star.*), object type, success / failure, time range.

Export: the same filter set drives a CSV download for compliance review.

UI: Project header → Audit tab (admin only).

De-identification profiles

A profile is a JSON document mapping a DICOM tag (keyword like PatientName or hex like 0010,0010) to a rule:

Rule	Effect
`remove`	Delete the tag
`blank`	Set to empty string
`replace:VALUE`	Set to literal VALUE
`date_shift:N`	Shift date by N days (N may be negative)
`hash`	Replace with deterministic hash (UIDs get a synthetic 1.2.840.99988.<digits> prefix)
`keep`	Explicit no-op

System profiles (immutable)

HIPAA Safe Harbor (default). Mirrors the v1 platform behavior — strips the PHI tag set from DICOM PS3.15 Annex E.
Strict Research. Hashes every UID, date-shifts every date, blanks free-text. Suitable for cohorts shared with external collaborators.
Minimal. Strips only direct identifiers. Internal use only.

Custom profiles

Anyone can clone a system profile to make their own. Project admins assign a profile per project. Existing scans can be re-de-identified (deid-redo) on demand.

UI: Project header → De-id.

STAR registry

UNOS STAR data — donor demographics, recipient outcomes, post-transplant follow-up — is loaded once and shared across all projects (read-only). Every query goes through the active project so it's audit-logged with the right scope.

Donor lookup

By DONOR_ID or by DICOM StudyInstanceUID (uses the crosswalk). Returns the donor record, any lung/heart-lung recipients for that donor, and per-recipient followup timelines.

Cohort browse

Sidebar facets, free-text search, sortable grid, paged. Filter by ABO, COD, donor type (DBD vs DCD), gender, ethnicity, has-CT, etc.

Saved cohorts

Click Save as cohort on the browse page. Freezes the resolved donor list at save time so you can re-fetch the same set later. Per-cohort export to CSV / XLSX / Parquet, plus a Scan overlap button that cross-references the cohort against scans in this project.

Annotations

Tag a donor with a label and optional note ("review again", "marginal — DCD"). Project-scoped — different projects keep independent annotation sets.

UI: Project header → STAR.

Analytics

Four built-in pages turn the STAR + imaging data into the kinds of summaries clinicians and reviewers expect to see. All four are project-scoped, audit-logged, and run on the cached parquet so a refresh is fast even on large cohorts.

Analyze

Project header → Analyze. Pick one or more scans from the project, run the rule-based screening pipeline (lung volume, aeration %, density symmetry, suspicious-region count, overall impression), and persist results to v2_screening_results. The scan list shows a ★ donor_id badge for crosswalked CTs and the most recent screening impression as a colored chip.

Survival

Project header → Survival. Kaplan–Meier survival curves over the lung-recipient cohort, defaulting to "only donors with a CT scan" so the analysis matches the cohort the multimodal model trains on. Features:

KM-corrected 1y / 3y / 5y survival rates and median survival
Stratification by donor ABO, DBD vs DCD, donor age bucket, listed organ (LU vs HL)
Quick-filter chips (DCD, ABO O, lung-only, age buckets)
Status-at-last-follow-up distribution chart
One-click CSV export of the per-recipient outcome frame for downstream survival analysis (lifelines, R, etc.)

Reference numbers in the live cohort: 1y ≈ 92%, 3y ≈ 76%, 5y ≈ 61% — consistent with published lung-transplant outcomes.

CT outcomes (imaging-branch proof of concept)

Project header → CT outcomes. Tests whether the rule-based imaging metrics (lung volume, aeration %, suspicious-region count) actually predict post-transplant survival. For each metric:

Donors are split into tertiles (configurable: median split / tertiles / quartiles)
One Kaplan–Meier curve per stratum, color-coded
K-sample log-rank chi-square + p-value, color-coded green (p ≤ 0.05) / amber (≤ 0.20) / muted (> 0.20)
Scatter plot of survival_days vs metric, dots colored by event vs censored

Coverage tile shows what fraction of CT-linked donors have screening data; both v1 screening_results.db and v2 v2_screening_results are unioned automatically.

Rejection (time to acute rejection)

Project header → Rejection. KM analysis where the event is "first follow-up reporting an acute rejection episode" (ACUTE_REJ_EPI = "Yes, …" — treated or untreated). Recipients with no observed rejection are censored at their last follow-up date. Default 1-year horizon since most acute rejection events cluster in the first 12 months.

Y-axis = "Rejection-free probability" (descends as rejections accumulate)
Stratify by donor ABO, DBD vs DCD, donor age bucket, listed organ, donor gender
Summary tiles include 1y / 3y / 5y rejection-free rates + median time to rejection (events only)

Reference numbers in the live cohort: 1y rejection-free ≈ 92%, median time to rejection (events only) ≈ 378 days. Donor-side stratifications (DBD/DCD, age) are typically not significant in our data — recipient-side and immunosuppression factors dominate.

Cox PH (multivariate hazard regression)

Project header → Cox PH. Multivariate Cox proportional-hazards model on the lung-recipient cohort (lifelines). Covariates: continuous donor age, height, weight, recipient age; one-hot ABO_DON, COD_CAD_DON, NON_HRT_DON (DBD vs DCD), GENDER_DON, recipient ABO. Output:

Forest plot of adjusted hazard ratios with 95% CI — red = harm, green = protective, gray = not significant. HR=1 reference line in dashed amber.
Model fit summary: n, events, Harrell C concordance, log-likelihood, feature counts (incl. how many were dropped for low-event subgroups).
Per-covariate detail table: HR, CI, β, SE, p-value sorted by significance.
PH-violation diagnostic (Schoenfeld residuals): p ≤ 0.05 means the covariate's effect changes over follow-up time and the HR is an average rather than a constant.

Reference numbers in the live cohort: concordance ≈ 0.557, donor age and recipient age are the strongest individually-significant predictors (HR ≈ 1.01 per year each). DCD vs DBD does not show a significant adjusted effect after age + COD adjustment.

Trends

Project header → Trends. Yearly aggregations on the entire U.S. lung-transplant pool from STAR (not just CT-linked donors). Configurable start/end year. Charts:

Lung transplants per year, split by listed organ (LU vs HL).
DCD share of donations over time (the adoption curve).
Donor age — median + IQR band.
Cause-of-death distribution per year (top 5 categories + OTHER).
ABO distribution per year (A / B / AB / O).

Reference numbers in the live cohort: ~39,500 lung transplants 2010–2025, DCD share rose from <1% in 2010 to ≈18% in 2025, annual volume growing from ~2,500/yr to ~3,500/yr.

Heatmap (empirical risk grid)

Project header → Heatmap. KM-corrected survival probability per cell, with rows × cols × optional sub-grids by donor age bucket, ABO, DBD/DCD, listed organ, gender. Color: green = better survival, red = worse, normalized across non-empty cells. Each cell shows the survival % plus the cell's n and observed deaths d. Cells below the configurable minimum N threshold show "—". The non-parametric baseline that any ML model has to beat.

DICOM metadata mining

Project header → DICOM meta. Distributions of acquisition technical metadata (vendor, scanner model, reconstruction kernel, study description, slice thickness, kVp, tube current, pixel spacing, in-plane rows) across a random sample of donor ZIPs. Surfaces domain-shift challenges before training the imaging branch.

The dashboard reads from a Parquet cache populated by a CLI sampler:

python sample_dicom_metadata.py \
    --pool /mnt/nas_unos/Downloads --n 500 \
    --out /mnt/nas_unos/.dicom_metadata_sample.parquet

Re-run with a larger N (e.g. 2000) to tighten the distributions; the page shows a friendly "no cache yet" message until the sampler runs.

ML pilot (XGBoost on STAR features)

Project header → ML pilot. XGBoost binary classifier predicting 1-year graft survival from donor + recipient features (the same set the Cox PH model uses). Temporal hold-out validation — the most recent year of transplants with at least 50 labeled rows is the test set, everything earlier trains. Output:

AUC-ROC + Brier score on the test set.
ROC curve (inline SVG) with diagonal reference.
Calibration curve in 10 deciles — predicted probability vs observed survival rate, with dot size = bin n.
Top-20 feature importance (gain).
Confusion matrix at threshold 0.5 with sensitivity / specificity / precision.

Tabular-only Cox/XGBoost typically lands AUC ≈ 0.55–0.70 on this kind of cohort. The case for the multimodal model is to beat this baseline by adding imaging features.

Multi-organ donor outcomes

Project header → Multi-organ. For each deceased donor whose lungs were transplanted, looks up that donor's kidney + liver recipients and asks whether their 1-year survival outcomes correlate. Tests the hypothesis that there is a donor-level "quality" signal that affects multiple organs.

Cohort overlap tiles: donors with lungs only, lung+kidney, lung+liver, lung+kidney+liver.
2×2 contingency of donor-level lung-1y × kidney-1y outcomes.
Odds ratio (Haldane–Anscombe corrected) + Yates' chi-square + p-value.
Conditional rates: P(lung alive | kidney alive) vs P(lung alive | kidney died).
Survival-days correlation scatter (Spearman ρ + Pearson r) for donors with both organ recipients.

If the lung outcome correlates with the kidney outcome from the same donor, the kidney recipient's survival could be a useful proxy/feature for the lung recipient's prognosis. Caveat: at the donor level, organ-specific outcomes also depend on the recipient's pre-existing condition.

3D Visualize

Project header → 3D viewer. Marching-cubes mesh of a CT volume, rendered with Plotly.js. Pick a scan from the project, choose a mode (lungs −300 HU, tissue −100 HU, bone +400 HU, or a custom-threshold slider), set the marching-cubes step size (5 = fast / 1 = full detail), click Render. The mesh appears in an interactive 3-D canvas you can rotate/zoom.

Implementation: server picks the largest series + most-common dimensions (drops scouts and localizers), stacks to HU, resamples to 2 mm isotropic spacing, runs scikit-image marching_cubes, caps the mesh at 150 k faces for browser responsiveness, returns the Plotly Mesh3d payload (vertices + face indices). Render time scales with volume size and step; a typical donor scan with step=3 takes 3–10 s server-side.

Methodology notes that apply to all KM pages

The KM estimator is implemented inline (~30 lines of vectorized pandas) — no lifelines dependency.
Censoring is right-censoring at the last known follow-up date.
The k-sample log-rank uses the standard Mantel–Haenszel chi-square with k−1 degrees of freedom; p-values come from scipy.stats.chi2.sf.
Strata smaller than 30 recipients are dropped from stratified curves to avoid noise.
"KM median" reads as days at which survival drops to 50%; "not reached" means >50% of the cohort survives past the chart horizon.
Outcome variables (PX_STAT, ACUTE_REJ_EPI) are decoded into human-readable labels before analysis; the underlying SAS codes are never displayed.

API tokens

Tokens authenticate the Python SDK against the same RBAC + audit pipeline the browser uses. Each token is scoped to one project and one user.

Format: vmt_ + 48 hex characters. The raw token is shown once on creation; the server stores only its SHA-256 hash. Lost a token? Mint another.

UI: Project header → Tokens.

Python SDK

pip install vitalmatch-sdk
from vitalmatch import Client
c = Client(base_url='https://vitalmatch.ai',
           token='vmt_…',
           project='legacy')

c.me()                                  # identity
c.star.summary()                        # cohort numbers
c.star.cohort.search(filters={'ABO_DON': ['O'], 'has_ct': True}).as_dataframe()
c.star.cohort.save(name='DCD ABO=O',
                   filters={'ABO_DON': ['O'], 'NON_HRT_DON': ['Y']})
c.subjects.create(external_id='CT_999', label='from notebook')
c.metadata.set('subject', subject_id, 'reviewed', True, 'bool')

Two example notebooks ship with the repo at notebooks/:

01_load_a_cohort.ipynb — connect, summary, filter, plot, save, export
02_train_on_multimodal_cohort.ipynb — train a small model on STAR features, write predictions back as annotations

Bug, missing feature, or unclear documentation? Email [email protected].