Help
Reference for every feature in VitalMatch. New here? Try the Quick start first.
Glossary
Terms and abbreviations used throughout the platform — clinical, analytical, and technical.
Transplant clinical terms
- ABO
- Blood group system. Four types: A, B, AB, O. STAR also tracks subtypes (A1, A2, A1B, A2B). Used for donor–recipient compatibility matching.
- DBD — Donation after Brain Death
- Donor was declared dead by neurologic criteria; circulation is maintained mechanically until organ procurement. Most U.S. transplants. In STAR:
NON_HRT_DON = N. - DCD — Donation after Circulatory Death
- Donor was declared dead by cardiopulmonary criteria after withdrawal of life-sustaining treatment. Smaller share of donations historically; growing rapidly. Lungs from DCD donors require careful preservation (warm ischemic time matters). In STAR:
NON_HRT_DON = Y(formerly "non-heart-beating donor"). - HLA — Human Leukocyte Antigen
- Cell-surface proteins that drive immune recognition of "self" vs "non-self". Donor and recipient HLA mismatch is a primary driver of acute rejection. STAR tracks HLA-A, -B, -DR loci.
- PRA — Panel Reactive Antibody
- Percentage of a standard donor panel against which the recipient already has antibodies. High PRA = sensitized recipient = harder to find a compatible donor. In STAR table:
THORACIC_PRA_CROSSMATCH_DATA. - Crossmatch
- Pre-transplant test mixing recipient serum with donor cells. A positive crossmatch typically aborts the transplant.
- Acute rejection
- Immune attack on the transplanted organ within weeks-to-months post-transplant. Often biopsy-graded. Treated with anti-rejection agents (steroid pulse, ATG, etc.). In STAR follow-up:
ACUTE_REJ_EPI. - Chronic lung allograft dysfunction (CLAD)
- Late, progressive decline in lung function — the dominant long-term cause of graft loss in lung transplant. Subtypes: BOS (bronchiolitis obliterans syndrome), RAS (restrictive allograft syndrome).
- Primary graft dysfunction (PGD)
- Acute lung injury within 72 h of transplant — early-period oxygenation impairment + chest-X-ray infiltrate. Major driver of 30-day mortality and predictor of CLAD.
- Ischemic time
- Cold + warm interval the organ spends without circulation between procurement and reperfusion. Long ischemic times correlate with worse early function.
- Lobar lung transplant
- Living-donor variant — typically two healthy donors each give one lower lobe to a single recipient. Rare in the U.S. now.
STAR variable cheat sheet
Suffix conventions: _DON = donor, _TRR = transplant recipient registration, blank suffix on a recipient row usually means recipient.
DONOR_ID·TRR_ID_CODE- Donor's unique STAR identifier · recipient registration code (one per (recipient, organ) pair).
WL_ORG- Waitlisted organ. Lung transplant rows:
LU(lung) orHL(heart-lung). TX_DATE·TXED- Date of transplant · whether the recipient was actually transplanted (1 = yes).
PX_STAT·PX_STAT_DATE- Patient status at follow-up (LIVING, DEAD, RETRANSPLANTED, LOST TO FOLLOW UP, NOT SEEN) · date of that status check.
AGE_DON·HGT_CM_DON_CALC·WGT_KG_DON_CALC- Donor age (years) · height (cm) · weight (kg).
GENDER_DON·ETHNICITY_DON- Donor gender · ethnicity (decoded labels in the UI; raw codes in the parquet).
ABO_DON·ABO- Donor blood group · recipient blood group.
COD_CAD_DON- Cause of death of cadaveric donor (anoxia, head trauma, cerebrovascular/stroke, CNS tumor, other).
NON_HRT_DON- Non-heart-beating donor flag.
Y= DCD,N= DBD. HIST_CIG_DON·HIST_ALCOHOL_DON- Donor smoking history · alcohol history (Y/N/U).
HEP_C_ANTI_DON·HBV_CORE_DON·HIV_NAT- Hepatitis C antibody · Hepatitis B core antibody · HIV nucleic acid test.
ACUTE_REJ_EPI- Acute rejection episode (per follow-up): "No", "Yes, at least one episode treated…", "Yes, none treated…".
TRT_REJ·HOSP_REJ·TRT_REJ_NUM- Treated for rejection · hospitalized for rejection · number of rejection treatments.
END_DATE·INIT_DATE·DEATH_DATE- End-of-record date · initial waitlist date · death date.
Imaging & CT terms
- DICOM
- Digital Imaging and Communications in Medicine — the standard file format for medical images. Each donor scan in our archive is one or more DICOM files (or a ZIP of them).
- StudyInstanceUID
- Globally-unique identifier for one CT acquisition, written into the DICOM headers at scan time. The bridge between an imaging file and STAR — the crosswalk maps StudyInstanceUID → DONOR_ID.
- SeriesInstanceUID
- Identifier for one image series within a study (e.g., a thin-slice axial reconstruction). One Study can contain several Series.
- HU — Hounsfield Unit
- CT density scale. Air = −1000, water = 0, soft tissue ≈ +30, bone > +400. Lung parenchyma is mostly between −950 and −500.
- Lung window
- Display window centered around lung-parenchyma HU values (typically C = −600, W = 1500). All slice viewer renders use this window.
- Aeration (in our metrics)
- Fraction of segmented lung voxels with HU between −950 and −500 (the "well-aerated" band). Lower aeration → more consolidation, atelectasis, edema, or infiltrate.
- Atelectasis · Consolidation · Edema · Emphysema
- Atelectasis: collapsed lung (often positional or post-intubation; raises HU). Consolidation: alveoli filled with fluid/pus/blood. Edema: extravascular fluid accumulation. Emphysema: alveolar destruction (lowers HU).
- Density symmetry
- Mean HU of left vs right hemithorax. A difference under 50 HU is treated as symmetric in our screening pipeline.
- Suspicious region
- Soft-tissue-density (−100 to +150 HU) lesion ≥ ~6 mm (113 mm³) inside the lung mask. Coarse nodule/mass count, not a diagnostic biomarker.
Statistical & analytical terms
- KM — Kaplan–Meier
- Non-parametric estimator of the survival function. Step function that drops at each death and accounts for censoring. Used by our Survival, Rejection, and CT-outcomes pages.
- HR — Hazard Ratio
- Multiplicative effect on the instantaneous risk of an event. HR > 1: the covariate increases risk. HR < 1: protective. HR = 1: no effect.
- 95% CI — 95% Confidence Interval
- Range that contains the true HR with 95% probability under repeated sampling. If the CI crosses 1, the effect is not statistically significant at α=0.05.
- Cox PH — Cox Proportional Hazards
- Multivariate survival regression. Estimates the HR for each covariate adjusted for all others. Assumes the HR is constant over follow-up time (the "proportional-hazards" assumption — diagnosed by the PH-violation panel).
- Log-rank test
- Non-parametric test for whether two or more KM curves are drawn from the same distribution. Outputs a chi-square statistic and a p-value. Used on every stratified curve in the platform.
- Concordance index (Harrell's C)
- Probability that, for two random recipients, the model correctly ranks who survives longer. 0.5 = chance. 1.0 = perfect ranking. 0.55–0.65 = modest discrimination, typical for purely tabular Cox in our data.
- Censoring (right-censoring)
- When the event of interest hasn't been observed by the end of follow-up. KM uses these patients up to the censoring date — they contribute to the at-risk denominator without ever counting as an event.
- Tertile · Quartile
- Splits a continuous variable into 3 / 4 equal-sized groups. The CT-outcomes page uses tertiles by default.
- p-value
- Probability of observing a test statistic at least as extreme as the one we got, under the null hypothesis. p ≤ 0.05 = conventional significance threshold (green in our pages); 0.05 < p ≤ 0.20 = suggestive (amber); p > 0.20 = no detectable signal at this sample size (gray).
- Forest plot
- Visual display of effect sizes (HR or similar) with their 95% CIs as horizontal bars, one per covariate. Used by the Cox PH page.
- Schoenfeld residuals
- Diagnostic statistic for testing the proportional-hazards assumption. If they correlate with time for any covariate, the HR for that covariate isn't really constant over follow-up.
Datasets & organizations
- UNOS — United Network for Organ Sharing
- Non-profit organization that manages the U.S. transplant network under contract with HRSA.
- OPTN — Organ Procurement and Transplantation Network
- The federally-mandated network UNOS administers. The OPTN/UNOS data infrastructure is what produces STAR.
- STAR — Standard Transplant Analysis and Research
- The annual research data release from UNOS. Roughly 25 GB compressed, ~19 lung-relevant tables (THORACIC_DATA, DECEASED_DONOR_DATA, THORACIC_FOLLOWUP_DATA, …).
- NLST — National Lung Screening Trial
- NIH/NCI screening trial, ~53,000 participants aged 50–74 with 30+ pack-year smoking history. We have ~13,500 NLST chest CTs locally; used as a (caveated) imaging baseline.
- UK Biobank
- ~500,000-participant UK population cohort with imaging, EHR, genomics. The cleanest publicly-accessible source of "true normal" chest CTs (population-based, not enriched for smokers).
Compliance & technical terms
- PHI — Protected Health Information
- Identifying information about a patient's health, governed by HIPAA. DICOM headers contain dozens of PHI fields (patient name, MRN, DOB, address, physician names, UIDs, …) — the de-identification pipeline strips or hashes them.
- HIPAA Safe Harbor
- The 18-element list of PHI identifiers that must be removed for data to be considered de-identified under HIPAA. Default de-id profile in VitalMatch.
- RBAC — Role-Based Access Control
- Per-project membership controls in v2: reader / editor / admin. See the Members & roles section below.
- SDK — Software Development Kit
- The pip-installable
vitalmatch-sdkPython package. See the Python SDK section below. - Parquet
- Columnar file format used to cache STAR tables on disk. Much faster than re-parsing the raw
.DATtab-separated files on every query. - Crosswalk
- The
DATA0014109_Crosswalk.csvfile that maps DICOMStudyInstanceUIDvalues to STARDONOR_ID. Roughly 9,300 deceased-donor CTs in our archive resolve through this file.
Projects
A project is the top-level container for a research question. Every subject, scan, annotation, custom metadata field, saved cohort, and audit-log entry lives inside exactly one project.
Creating a project
Visit Projects → enter a name + optional description → + New project. The slug (URL identifier) is auto-derived; collisions auto-suffix to -2, -3, etc. The creator becomes the project's first admin.
Project visibility
You only see projects you're a member of (global admins see everything). Other users won't know your project exists unless you add them.
Subjects & scans
Two-tier hierarchy below a project:
Project → Subject → Scan → DICOM file
One Subject typically corresponds to one UNOS donor (or one NLST control). The external_id field holds whatever the caller wants — for v1-imported data it's the original patient_id.
One Scan is one CT acquisition — a directory of DICOM files or, more commonly, an anonymized ZIP archive on disk. The Scan row stores metadata (image_path, study_uid, donor_id, deid_status); the actual DICOM bytes stay on the NAS.
Registering a scan with a subject_external_id auto-creates the Subject if it doesn't exist — useful for the v1 importer and for the SDK.
Custom metadata
Generic key-value store on every object (project, subject, scan, screening, STAR donor reference). Values are typed: string, number, bool, date, json.
Use it for whatever your research project needs — a "reviewed" flag, a risk score, a free-text note, a nested JSON blob from an external system. Edits are immediate; the same key written twice overwrites.
API:
PUT /api/v2/objects/<type>/<id>/metadata/<key> {value, value_type}
GET /api/v2/objects/<type>/<id>/metadata full bag for one object
DELETE /api/v2/objects/<type>/<id>/metadata/<key>
Members & roles
Per-project membership controls who can do what. Three roles:
| Role | Can |
|---|---|
| reader | View everything in the project — read-only. |
| editor | Reader + create/edit subjects, scans, metadata, annotations, saved cohorts. |
| admin | Editor + manage members, change project settings, view the audit log, manage de-id profile, redo de-identification. |
Global admins (configured outside the UI) bypass project membership for emergency access — but they still appear in the audit log.
Safety rails: you can't remove or demote the last admin of a project (would orphan it).
UI: Project header → Members tab.
Audit log
Append-only record of every state-changing action. Failed access attempts are recorded too, with the rejection reason in detail_json.
Rows persist forever (tamper-evidence). The user_label column is denormalized at write time so log entries survive later renames or deletions of the user.
Filters available: user, action (e.g. scan.create), action prefix (star.*), object type, success / failure, time range.
Export: the same filter set drives a CSV download for compliance review.
UI: Project header → Audit tab (admin only).
De-identification profiles
A profile is a JSON document mapping a DICOM tag (keyword like PatientName or hex like 0010,0010) to a rule:
| Rule | Effect |
|---|---|
remove | Delete the tag |
blank | Set to empty string |
replace:VALUE | Set to literal VALUE |
date_shift:N | Shift date by N days (N may be negative) |
hash | Replace with deterministic hash (UIDs get a synthetic 1.2.840.99988.<digits> prefix) |
keep | Explicit no-op |
System profiles (immutable)
- HIPAA Safe Harbor (default). Mirrors the v1 platform behavior — strips the PHI tag set from DICOM PS3.15 Annex E.
- Strict Research. Hashes every UID, date-shifts every date, blanks free-text. Suitable for cohorts shared with external collaborators.
- Minimal. Strips only direct identifiers. Internal use only.
Custom profiles
Anyone can clone a system profile to make their own. Project admins assign a profile per project. Existing scans can be re-de-identified (deid-redo) on demand.
UI: Project header → De-id.
STAR registry
UNOS STAR data — donor demographics, recipient outcomes, post-transplant follow-up — is loaded once and shared across all projects (read-only). Every query goes through the active project so it's audit-logged with the right scope.
Donor lookup
By DONOR_ID or by DICOM StudyInstanceUID (uses the crosswalk). Returns the donor record, any lung/heart-lung recipients for that donor, and per-recipient followup timelines.
Cohort browse
Sidebar facets, free-text search, sortable grid, paged. Filter by ABO, COD, donor type (DBD vs DCD), gender, ethnicity, has-CT, etc.
Saved cohorts
Click Save as cohort on the browse page. Freezes the resolved donor list at save time so you can re-fetch the same set later. Per-cohort export to CSV / XLSX / Parquet, plus a Scan overlap button that cross-references the cohort against scans in this project.
Annotations
Tag a donor with a label and optional note ("review again", "marginal — DCD"). Project-scoped — different projects keep independent annotation sets.
UI: Project header → STAR.
Analytics
Four built-in pages turn the STAR + imaging data into the kinds of summaries clinicians and reviewers expect to see. All four are project-scoped, audit-logged, and run on the cached parquet so a refresh is fast even on large cohorts.
Analyze
Project header → Analyze. Pick one or more scans from the project, run the rule-based screening pipeline (lung volume, aeration %, density symmetry, suspicious-region count, overall impression), and persist results to v2_screening_results. The scan list shows a ★ donor_id badge for crosswalked CTs and the most recent screening impression as a colored chip.
Survival
Project header → Survival. Kaplan–Meier survival curves over the lung-recipient cohort, defaulting to "only donors with a CT scan" so the analysis matches the cohort the multimodal model trains on. Features:
- KM-corrected 1y / 3y / 5y survival rates and median survival
- Stratification by donor ABO, DBD vs DCD, donor age bucket, listed organ (LU vs HL)
- Quick-filter chips (DCD, ABO O, lung-only, age buckets)
- Status-at-last-follow-up distribution chart
- One-click CSV export of the per-recipient outcome frame for downstream survival analysis (lifelines, R, etc.)
Reference numbers in the live cohort: 1y ≈ 92%, 3y ≈ 76%, 5y ≈ 61% — consistent with published lung-transplant outcomes.
CT outcomes (imaging-branch proof of concept)
Project header → CT outcomes. Tests whether the rule-based imaging metrics (lung volume, aeration %, suspicious-region count) actually predict post-transplant survival. For each metric:
- Donors are split into tertiles (configurable: median split / tertiles / quartiles)
- One Kaplan–Meier curve per stratum, color-coded
- K-sample log-rank chi-square + p-value, color-coded green (p ≤ 0.05) / amber (≤ 0.20) / muted (> 0.20)
- Scatter plot of
survival_days vs metric, dots colored by event vs censored
Coverage tile shows what fraction of CT-linked donors have screening data; both v1 screening_results.db and v2 v2_screening_results are unioned automatically.
Rejection (time to acute rejection)
Project header → Rejection. KM analysis where the event is "first follow-up reporting an acute rejection episode" (ACUTE_REJ_EPI = "Yes, …" — treated or untreated). Recipients with no observed rejection are censored at their last follow-up date. Default 1-year horizon since most acute rejection events cluster in the first 12 months.
- Y-axis = "Rejection-free probability" (descends as rejections accumulate)
- Stratify by donor ABO, DBD vs DCD, donor age bucket, listed organ, donor gender
- Summary tiles include 1y / 3y / 5y rejection-free rates + median time to rejection (events only)
Reference numbers in the live cohort: 1y rejection-free ≈ 92%, median time to rejection (events only) ≈ 378 days. Donor-side stratifications (DBD/DCD, age) are typically not significant in our data — recipient-side and immunosuppression factors dominate.
Cox PH (multivariate hazard regression)
Project header → Cox PH. Multivariate Cox proportional-hazards model on the lung-recipient cohort (lifelines). Covariates: continuous donor age, height, weight, recipient age; one-hot ABO_DON, COD_CAD_DON, NON_HRT_DON (DBD vs DCD), GENDER_DON, recipient ABO. Output:
- Forest plot of adjusted hazard ratios with 95% CI — red = harm, green = protective, gray = not significant. HR=1 reference line in dashed amber.
- Model fit summary: n, events, Harrell C concordance, log-likelihood, feature counts (incl. how many were dropped for low-event subgroups).
- Per-covariate detail table: HR, CI, β, SE, p-value sorted by significance.
- PH-violation diagnostic (Schoenfeld residuals): p ≤ 0.05 means the covariate's effect changes over follow-up time and the HR is an average rather than a constant.
Reference numbers in the live cohort: concordance ≈ 0.557, donor age and recipient age are the strongest individually-significant predictors (HR ≈ 1.01 per year each). DCD vs DBD does not show a significant adjusted effect after age + COD adjustment.
Trends
Project header → Trends. Yearly aggregations on the entire U.S. lung-transplant pool from STAR (not just CT-linked donors). Configurable start/end year. Charts:
- Lung transplants per year, split by listed organ (LU vs HL).
- DCD share of donations over time (the adoption curve).
- Donor age — median + IQR band.
- Cause-of-death distribution per year (top 5 categories + OTHER).
- ABO distribution per year (A / B / AB / O).
Reference numbers in the live cohort: ~39,500 lung transplants 2010–2025, DCD share rose from <1% in 2010 to ≈18% in 2025, annual volume growing from ~2,500/yr to ~3,500/yr.
Heatmap (empirical risk grid)
Project header → Heatmap. KM-corrected survival probability per cell,
with rows × cols × optional sub-grids by donor age bucket, ABO, DBD/DCD, listed organ, gender.
Color: green = better survival, red = worse, normalized across non-empty cells. Each cell shows
the survival % plus the cell's n and observed deaths d. Cells below
the configurable minimum N threshold show "—". The non-parametric baseline that any ML model
has to beat.
DICOM metadata mining
Project header → DICOM meta. Distributions of acquisition technical metadata (vendor, scanner model, reconstruction kernel, study description, slice thickness, kVp, tube current, pixel spacing, in-plane rows) across a random sample of donor ZIPs. Surfaces domain-shift challenges before training the imaging branch.
The dashboard reads from a Parquet cache populated by a CLI sampler:
python sample_dicom_metadata.py \
--pool /mnt/nas_unos/Downloads --n 500 \
--out /mnt/nas_unos/.dicom_metadata_sample.parquet
Re-run with a larger N (e.g. 2000) to tighten the distributions; the page shows a friendly "no cache yet" message until the sampler runs.
ML pilot (XGBoost on STAR features)
Project header → ML pilot. XGBoost binary classifier predicting 1-year graft survival from donor + recipient features (the same set the Cox PH model uses). Temporal hold-out validation — the most recent year of transplants with at least 50 labeled rows is the test set, everything earlier trains. Output:
- AUC-ROC + Brier score on the test set.
- ROC curve (inline SVG) with diagonal reference.
- Calibration curve in 10 deciles — predicted probability vs observed survival rate, with dot size = bin n.
- Top-20 feature importance (gain).
- Confusion matrix at threshold 0.5 with sensitivity / specificity / precision.
Tabular-only Cox/XGBoost typically lands AUC ≈ 0.55–0.70 on this kind of cohort. The case for the multimodal model is to beat this baseline by adding imaging features.
Multi-organ donor outcomes
Project header → Multi-organ. For each deceased donor whose lungs were transplanted, looks up that donor's kidney + liver recipients and asks whether their 1-year survival outcomes correlate. Tests the hypothesis that there is a donor-level "quality" signal that affects multiple organs.
- Cohort overlap tiles: donors with lungs only, lung+kidney, lung+liver, lung+kidney+liver.
- 2×2 contingency of donor-level lung-1y × kidney-1y outcomes.
- Odds ratio (Haldane–Anscombe corrected) + Yates' chi-square + p-value.
- Conditional rates: P(lung alive | kidney alive) vs P(lung alive | kidney died).
- Survival-days correlation scatter (Spearman ρ + Pearson r) for donors with both organ recipients.
If the lung outcome correlates with the kidney outcome from the same donor, the kidney recipient's survival could be a useful proxy/feature for the lung recipient's prognosis. Caveat: at the donor level, organ-specific outcomes also depend on the recipient's pre-existing condition.
3D Visualize
Project header → 3D viewer. Marching-cubes mesh of a CT volume, rendered with Plotly.js. Pick a scan from the project, choose a mode (lungs −300 HU, tissue −100 HU, bone +400 HU, or a custom-threshold slider), set the marching-cubes step size (5 = fast / 1 = full detail), click Render. The mesh appears in an interactive 3-D canvas you can rotate/zoom.
Implementation: server picks the largest series + most-common dimensions (drops scouts
and localizers), stacks to HU, resamples to 2 mm isotropic spacing, runs scikit-image
marching_cubes, caps the mesh at 150 k faces for browser responsiveness, returns
the Plotly Mesh3d payload (vertices + face indices). Render time scales with volume size and
step; a typical donor scan with step=3 takes 3–10 s server-side.
Methodology notes that apply to all KM pages
- The KM estimator is implemented inline (~30 lines of vectorized pandas) — no
lifelinesdependency. - Censoring is right-censoring at the last known follow-up date.
- The k-sample log-rank uses the standard Mantel–Haenszel chi-square with k−1 degrees of freedom; p-values come from
scipy.stats.chi2.sf. - Strata smaller than 30 recipients are dropped from stratified curves to avoid noise.
- "KM median" reads as days at which survival drops to 50%; "not reached" means >50% of the cohort survives past the chart horizon.
- Outcome variables (PX_STAT, ACUTE_REJ_EPI) are decoded into human-readable labels before analysis; the underlying SAS codes are never displayed.
API tokens
Tokens authenticate the Python SDK against the same RBAC + audit pipeline the browser uses. Each token is scoped to one project and one user.
Format: vmt_ + 48 hex characters. The raw token is shown once on creation; the server stores only its SHA-256 hash. Lost a token? Mint another.
UI: Project header → Tokens.
Python SDK
pip install vitalmatch-sdk
from vitalmatch import Client
c = Client(base_url='https://vitalmatch.ai',
token='vmt_…',
project='legacy')
c.me() # identity
c.star.summary() # cohort numbers
c.star.cohort.search(filters={'ABO_DON': ['O'], 'has_ct': True}).as_dataframe()
c.star.cohort.save(name='DCD ABO=O',
filters={'ABO_DON': ['O'], 'NON_HRT_DON': ['Y']})
c.subjects.create(external_id='CT_999', label='from notebook')
c.metadata.set('subject', subject_id, 'reviewed', True, 'bool')
Two example notebooks ship with the repo at notebooks/:
01_load_a_cohort.ipynb— connect, summary, filter, plot, save, export02_train_on_multimodal_cohort.ipynb— train a small model on STAR features, write predictions back as annotations
Bug, missing feature, or unclear documentation? Email [email protected] or open an issue at github.com/gilblankenship/LungCT_Diagnosis.