Help

Reference for every feature in VitalMatch. New here? Try the Quick start first.

Glossary

Terms and abbreviations used throughout the platform — clinical, analytical, and technical.

Transplant clinical terms

ABO
Blood group system. Four types: A, B, AB, O. STAR also tracks subtypes (A1, A2, A1B, A2B). Used for donor–recipient compatibility matching.
DBD — Donation after Brain Death
Donor was declared dead by neurologic criteria; circulation is maintained mechanically until organ procurement. Most U.S. transplants. In STAR: NON_HRT_DON = N.
DCD — Donation after Circulatory Death
Donor was declared dead by cardiopulmonary criteria after withdrawal of life-sustaining treatment. Smaller share of donations historically; growing rapidly. Lungs from DCD donors require careful preservation (warm ischemic time matters). In STAR: NON_HRT_DON = Y (formerly "non-heart-beating donor").
HLA — Human Leukocyte Antigen
Cell-surface proteins that drive immune recognition of "self" vs "non-self". Donor and recipient HLA mismatch is a primary driver of acute rejection. STAR tracks HLA-A, -B, -DR loci.
PRA — Panel Reactive Antibody
Percentage of a standard donor panel against which the recipient already has antibodies. High PRA = sensitized recipient = harder to find a compatible donor. In STAR table: THORACIC_PRA_CROSSMATCH_DATA.
Crossmatch
Pre-transplant test mixing recipient serum with donor cells. A positive crossmatch typically aborts the transplant.
Acute rejection
Immune attack on the transplanted organ within weeks-to-months post-transplant. Often biopsy-graded. Treated with anti-rejection agents (steroid pulse, ATG, etc.). In STAR follow-up: ACUTE_REJ_EPI.
Chronic lung allograft dysfunction (CLAD)
Late, progressive decline in lung function — the dominant long-term cause of graft loss in lung transplant. Subtypes: BOS (bronchiolitis obliterans syndrome), RAS (restrictive allograft syndrome).
Primary graft dysfunction (PGD)
Acute lung injury within 72 h of transplant — early-period oxygenation impairment + chest-X-ray infiltrate. Major driver of 30-day mortality and predictor of CLAD.
Ischemic time
Cold + warm interval the organ spends without circulation between procurement and reperfusion. Long ischemic times correlate with worse early function.
Lobar lung transplant
Living-donor variant — typically two healthy donors each give one lower lobe to a single recipient. Rare in the U.S. now.

STAR variable cheat sheet

Suffix conventions: _DON = donor, _TRR = transplant recipient registration, blank suffix on a recipient row usually means recipient.

DONOR_ID · TRR_ID_CODE
Donor's unique STAR identifier · recipient registration code (one per (recipient, organ) pair).
WL_ORG
Waitlisted organ. Lung transplant rows: LU (lung) or HL (heart-lung).
TX_DATE · TXED
Date of transplant · whether the recipient was actually transplanted (1 = yes).
PX_STAT · PX_STAT_DATE
Patient status at follow-up (LIVING, DEAD, RETRANSPLANTED, LOST TO FOLLOW UP, NOT SEEN) · date of that status check.
AGE_DON · HGT_CM_DON_CALC · WGT_KG_DON_CALC
Donor age (years) · height (cm) · weight (kg).
GENDER_DON · ETHNICITY_DON
Donor gender · ethnicity (decoded labels in the UI; raw codes in the parquet).
ABO_DON · ABO
Donor blood group · recipient blood group.
COD_CAD_DON
Cause of death of cadaveric donor (anoxia, head trauma, cerebrovascular/stroke, CNS tumor, other).
NON_HRT_DON
Non-heart-beating donor flag. Y = DCD, N = DBD.
HIST_CIG_DON · HIST_ALCOHOL_DON
Donor smoking history · alcohol history (Y/N/U).
HEP_C_ANTI_DON · HBV_CORE_DON · HIV_NAT
Hepatitis C antibody · Hepatitis B core antibody · HIV nucleic acid test.
ACUTE_REJ_EPI
Acute rejection episode (per follow-up): "No", "Yes, at least one episode treated…", "Yes, none treated…".
TRT_REJ · HOSP_REJ · TRT_REJ_NUM
Treated for rejection · hospitalized for rejection · number of rejection treatments.
END_DATE · INIT_DATE · DEATH_DATE
End-of-record date · initial waitlist date · death date.

Imaging & CT terms

DICOM
Digital Imaging and Communications in Medicine — the standard file format for medical images. Each donor scan in our archive is one or more DICOM files (or a ZIP of them).
StudyInstanceUID
Globally-unique identifier for one CT acquisition, written into the DICOM headers at scan time. The bridge between an imaging file and STAR — the crosswalk maps StudyInstanceUID → DONOR_ID.
SeriesInstanceUID
Identifier for one image series within a study (e.g., a thin-slice axial reconstruction). One Study can contain several Series.
HU — Hounsfield Unit
CT density scale. Air = −1000, water = 0, soft tissue ≈ +30, bone > +400. Lung parenchyma is mostly between −950 and −500.
Lung window
Display window centered around lung-parenchyma HU values (typically C = −600, W = 1500). All slice viewer renders use this window.
Aeration (in our metrics)
Fraction of segmented lung voxels with HU between −950 and −500 (the "well-aerated" band). Lower aeration → more consolidation, atelectasis, edema, or infiltrate.
Atelectasis · Consolidation · Edema · Emphysema
Atelectasis: collapsed lung (often positional or post-intubation; raises HU). Consolidation: alveoli filled with fluid/pus/blood. Edema: extravascular fluid accumulation. Emphysema: alveolar destruction (lowers HU).
Density symmetry
Mean HU of left vs right hemithorax. A difference under 50 HU is treated as symmetric in our screening pipeline.
Suspicious region
Soft-tissue-density (−100 to +150 HU) lesion ≥ ~6 mm (113 mm³) inside the lung mask. Coarse nodule/mass count, not a diagnostic biomarker.

Statistical & analytical terms

KM — Kaplan–Meier
Non-parametric estimator of the survival function. Step function that drops at each death and accounts for censoring. Used by our Survival, Rejection, and CT-outcomes pages.
HR — Hazard Ratio
Multiplicative effect on the instantaneous risk of an event. HR > 1: the covariate increases risk. HR < 1: protective. HR = 1: no effect.
95% CI — 95% Confidence Interval
Range that contains the true HR with 95% probability under repeated sampling. If the CI crosses 1, the effect is not statistically significant at α=0.05.
Cox PH — Cox Proportional Hazards
Multivariate survival regression. Estimates the HR for each covariate adjusted for all others. Assumes the HR is constant over follow-up time (the "proportional-hazards" assumption — diagnosed by the PH-violation panel).
Log-rank test
Non-parametric test for whether two or more KM curves are drawn from the same distribution. Outputs a chi-square statistic and a p-value. Used on every stratified curve in the platform.
Concordance index (Harrell's C)
Probability that, for two random recipients, the model correctly ranks who survives longer. 0.5 = chance. 1.0 = perfect ranking. 0.55–0.65 = modest discrimination, typical for purely tabular Cox in our data.
Censoring (right-censoring)
When the event of interest hasn't been observed by the end of follow-up. KM uses these patients up to the censoring date — they contribute to the at-risk denominator without ever counting as an event.
Tertile · Quartile
Splits a continuous variable into 3 / 4 equal-sized groups. The CT-outcomes page uses tertiles by default.
p-value
Probability of observing a test statistic at least as extreme as the one we got, under the null hypothesis. p ≤ 0.05 = conventional significance threshold (green in our pages); 0.05 < p ≤ 0.20 = suggestive (amber); p > 0.20 = no detectable signal at this sample size (gray).
Forest plot
Visual display of effect sizes (HR or similar) with their 95% CIs as horizontal bars, one per covariate. Used by the Cox PH page.
Schoenfeld residuals
Diagnostic statistic for testing the proportional-hazards assumption. If they correlate with time for any covariate, the HR for that covariate isn't really constant over follow-up.

Datasets & organizations

UNOS — United Network for Organ Sharing
Non-profit organization that manages the U.S. transplant network under contract with HRSA.
OPTN — Organ Procurement and Transplantation Network
The federally-mandated network UNOS administers. The OPTN/UNOS data infrastructure is what produces STAR.
STAR — Standard Transplant Analysis and Research
The annual research data release from UNOS. Roughly 25 GB compressed, ~19 lung-relevant tables (THORACIC_DATA, DECEASED_DONOR_DATA, THORACIC_FOLLOWUP_DATA, …).
NLST — National Lung Screening Trial
NIH/NCI screening trial, ~53,000 participants aged 50–74 with 30+ pack-year smoking history. We have ~13,500 NLST chest CTs locally; used as a (caveated) imaging baseline.
UK Biobank
~500,000-participant UK population cohort with imaging, EHR, genomics. The cleanest publicly-accessible source of "true normal" chest CTs (population-based, not enriched for smokers).

Compliance & technical terms

PHI — Protected Health Information
Identifying information about a patient's health, governed by HIPAA. DICOM headers contain dozens of PHI fields (patient name, MRN, DOB, address, physician names, UIDs, …) — the de-identification pipeline strips or hashes them.
HIPAA Safe Harbor
The 18-element list of PHI identifiers that must be removed for data to be considered de-identified under HIPAA. Default de-id profile in VitalMatch.
RBAC — Role-Based Access Control
Per-project membership controls in v2: reader / editor / admin. See the Members & roles section below.
SDK — Software Development Kit
The pip-installable vitalmatch-sdk Python package. See the Python SDK section below.
Parquet
Columnar file format used to cache STAR tables on disk. Much faster than re-parsing the raw .DAT tab-separated files on every query.
Crosswalk
The DATA0014109_Crosswalk.csv file that maps DICOM StudyInstanceUID values to STAR DONOR_ID. Roughly 9,300 deceased-donor CTs in our archive resolve through this file.

Projects

A project is the top-level container for a research question. Every subject, scan, annotation, custom metadata field, saved cohort, and audit-log entry lives inside exactly one project.

Creating a project

Visit Projects → enter a name + optional description → + New project. The slug (URL identifier) is auto-derived; collisions auto-suffix to -2, -3, etc. The creator becomes the project's first admin.

Project visibility

You only see projects you're a member of (global admins see everything). Other users won't know your project exists unless you add them.

Subjects & scans

Two-tier hierarchy below a project:

Project → Subject → Scan → DICOM file

One Subject typically corresponds to one UNOS donor (or one NLST control). The external_id field holds whatever the caller wants — for v1-imported data it's the original patient_id.

One Scan is one CT acquisition — a directory of DICOM files or, more commonly, an anonymized ZIP archive on disk. The Scan row stores metadata (image_path, study_uid, donor_id, deid_status); the actual DICOM bytes stay on the NAS.

Registering a scan with a subject_external_id auto-creates the Subject if it doesn't exist — useful for the v1 importer and for the SDK.

Custom metadata

Generic key-value store on every object (project, subject, scan, screening, STAR donor reference). Values are typed: string, number, bool, date, json.

Use it for whatever your research project needs — a "reviewed" flag, a risk score, a free-text note, a nested JSON blob from an external system. Edits are immediate; the same key written twice overwrites.

API:

PUT  /api/v2/objects/<type>/<id>/metadata/<key>    {value, value_type}
GET  /api/v2/objects/<type>/<id>/metadata           full bag for one object
DELETE /api/v2/objects/<type>/<id>/metadata/<key>

Members & roles

Per-project membership controls who can do what. Three roles:

RoleCan
readerView everything in the project — read-only.
editorReader + create/edit subjects, scans, metadata, annotations, saved cohorts.
adminEditor + manage members, change project settings, view the audit log, manage de-id profile, redo de-identification.

Global admins (configured outside the UI) bypass project membership for emergency access — but they still appear in the audit log.

Safety rails: you can't remove or demote the last admin of a project (would orphan it).

UI: Project header → Members tab.

Audit log

Append-only record of every state-changing action. Failed access attempts are recorded too, with the rejection reason in detail_json.

Rows persist forever (tamper-evidence). The user_label column is denormalized at write time so log entries survive later renames or deletions of the user.

Filters available: user, action (e.g. scan.create), action prefix (star.*), object type, success / failure, time range.

Export: the same filter set drives a CSV download for compliance review.

UI: Project header → Audit tab (admin only).

De-identification profiles

A profile is a JSON document mapping a DICOM tag (keyword like PatientName or hex like 0010,0010) to a rule:

RuleEffect
removeDelete the tag
blankSet to empty string
replace:VALUESet to literal VALUE
date_shift:NShift date by N days (N may be negative)
hashReplace with deterministic hash (UIDs get a synthetic 1.2.840.99988.<digits> prefix)
keepExplicit no-op

System profiles (immutable)

Custom profiles

Anyone can clone a system profile to make their own. Project admins assign a profile per project. Existing scans can be re-de-identified (deid-redo) on demand.

UI: Project header → De-id.

STAR registry

UNOS STAR data — donor demographics, recipient outcomes, post-transplant follow-up — is loaded once and shared across all projects (read-only). Every query goes through the active project so it's audit-logged with the right scope.

Donor lookup

By DONOR_ID or by DICOM StudyInstanceUID (uses the crosswalk). Returns the donor record, any lung/heart-lung recipients for that donor, and per-recipient followup timelines.

Cohort browse

Sidebar facets, free-text search, sortable grid, paged. Filter by ABO, COD, donor type (DBD vs DCD), gender, ethnicity, has-CT, etc.

Saved cohorts

Click Save as cohort on the browse page. Freezes the resolved donor list at save time so you can re-fetch the same set later. Per-cohort export to CSV / XLSX / Parquet, plus a Scan overlap button that cross-references the cohort against scans in this project.

Annotations

Tag a donor with a label and optional note ("review again", "marginal — DCD"). Project-scoped — different projects keep independent annotation sets.

UI: Project header → STAR.

Analytics

Four built-in pages turn the STAR + imaging data into the kinds of summaries clinicians and reviewers expect to see. All four are project-scoped, audit-logged, and run on the cached parquet so a refresh is fast even on large cohorts.

Analyze

Project header → Analyze. Pick one or more scans from the project, run the rule-based screening pipeline (lung volume, aeration %, density symmetry, suspicious-region count, overall impression), and persist results to v2_screening_results. The scan list shows a ★ donor_id badge for crosswalked CTs and the most recent screening impression as a colored chip.

Survival

Project header → Survival. Kaplan–Meier survival curves over the lung-recipient cohort, defaulting to "only donors with a CT scan" so the analysis matches the cohort the multimodal model trains on. Features:

Reference numbers in the live cohort: 1y ≈ 92%, 3y ≈ 76%, 5y ≈ 61% — consistent with published lung-transplant outcomes.

CT outcomes (imaging-branch proof of concept)

Project header → CT outcomes. Tests whether the rule-based imaging metrics (lung volume, aeration %, suspicious-region count) actually predict post-transplant survival. For each metric:

Coverage tile shows what fraction of CT-linked donors have screening data; both v1 screening_results.db and v2 v2_screening_results are unioned automatically.

Rejection (time to acute rejection)

Project header → Rejection. KM analysis where the event is "first follow-up reporting an acute rejection episode" (ACUTE_REJ_EPI = "Yes, …" — treated or untreated). Recipients with no observed rejection are censored at their last follow-up date. Default 1-year horizon since most acute rejection events cluster in the first 12 months.

Reference numbers in the live cohort: 1y rejection-free ≈ 92%, median time to rejection (events only) ≈ 378 days. Donor-side stratifications (DBD/DCD, age) are typically not significant in our data — recipient-side and immunosuppression factors dominate.

Cox PH (multivariate hazard regression)

Project header → Cox PH. Multivariate Cox proportional-hazards model on the lung-recipient cohort (lifelines). Covariates: continuous donor age, height, weight, recipient age; one-hot ABO_DON, COD_CAD_DON, NON_HRT_DON (DBD vs DCD), GENDER_DON, recipient ABO. Output:

Reference numbers in the live cohort: concordance ≈ 0.557, donor age and recipient age are the strongest individually-significant predictors (HR ≈ 1.01 per year each). DCD vs DBD does not show a significant adjusted effect after age + COD adjustment.

Trends

Project header → Trends. Yearly aggregations on the entire U.S. lung-transplant pool from STAR (not just CT-linked donors). Configurable start/end year. Charts:

Reference numbers in the live cohort: ~39,500 lung transplants 2010–2025, DCD share rose from <1% in 2010 to ≈18% in 2025, annual volume growing from ~2,500/yr to ~3,500/yr.

Heatmap (empirical risk grid)

Project header → Heatmap. KM-corrected survival probability per cell, with rows × cols × optional sub-grids by donor age bucket, ABO, DBD/DCD, listed organ, gender. Color: green = better survival, red = worse, normalized across non-empty cells. Each cell shows the survival % plus the cell's n and observed deaths d. Cells below the configurable minimum N threshold show "—". The non-parametric baseline that any ML model has to beat.

DICOM metadata mining

Project header → DICOM meta. Distributions of acquisition technical metadata (vendor, scanner model, reconstruction kernel, study description, slice thickness, kVp, tube current, pixel spacing, in-plane rows) across a random sample of donor ZIPs. Surfaces domain-shift challenges before training the imaging branch.

The dashboard reads from a Parquet cache populated by a CLI sampler:

python sample_dicom_metadata.py \
    --pool /mnt/nas_unos/Downloads --n 500 \
    --out /mnt/nas_unos/.dicom_metadata_sample.parquet

Re-run with a larger N (e.g. 2000) to tighten the distributions; the page shows a friendly "no cache yet" message until the sampler runs.

ML pilot (XGBoost on STAR features)

Project header → ML pilot. XGBoost binary classifier predicting 1-year graft survival from donor + recipient features (the same set the Cox PH model uses). Temporal hold-out validation — the most recent year of transplants with at least 50 labeled rows is the test set, everything earlier trains. Output:

Tabular-only Cox/XGBoost typically lands AUC ≈ 0.55–0.70 on this kind of cohort. The case for the multimodal model is to beat this baseline by adding imaging features.

Multi-organ donor outcomes

Project header → Multi-organ. For each deceased donor whose lungs were transplanted, looks up that donor's kidney + liver recipients and asks whether their 1-year survival outcomes correlate. Tests the hypothesis that there is a donor-level "quality" signal that affects multiple organs.

If the lung outcome correlates with the kidney outcome from the same donor, the kidney recipient's survival could be a useful proxy/feature for the lung recipient's prognosis. Caveat: at the donor level, organ-specific outcomes also depend on the recipient's pre-existing condition.

3D Visualize

Project header → 3D viewer. Marching-cubes mesh of a CT volume, rendered with Plotly.js. Pick a scan from the project, choose a mode (lungs −300 HU, tissue −100 HU, bone +400 HU, or a custom-threshold slider), set the marching-cubes step size (5 = fast / 1 = full detail), click Render. The mesh appears in an interactive 3-D canvas you can rotate/zoom.

Implementation: server picks the largest series + most-common dimensions (drops scouts and localizers), stacks to HU, resamples to 2 mm isotropic spacing, runs scikit-image marching_cubes, caps the mesh at 150 k faces for browser responsiveness, returns the Plotly Mesh3d payload (vertices + face indices). Render time scales with volume size and step; a typical donor scan with step=3 takes 3–10 s server-side.

Methodology notes that apply to all KM pages

API tokens

Tokens authenticate the Python SDK against the same RBAC + audit pipeline the browser uses. Each token is scoped to one project and one user.

Format: vmt_ + 48 hex characters. The raw token is shown once on creation; the server stores only its SHA-256 hash. Lost a token? Mint another.

UI: Project header → Tokens.

Python SDK

pip install vitalmatch-sdk
from vitalmatch import Client
c = Client(base_url='https://vitalmatch.ai',
           token='vmt_…',
           project='legacy')

c.me()                                  # identity
c.star.summary()                        # cohort numbers
c.star.cohort.search(filters={'ABO_DON': ['O'], 'has_ct': True}).as_dataframe()
c.star.cohort.save(name='DCD ABO=O',
                   filters={'ABO_DON': ['O'], 'NON_HRT_DON': ['Y']})
c.subjects.create(external_id='CT_999', label='from notebook')
c.metadata.set('subject', subject_id, 'reviewed', True, 'bool')

Two example notebooks ship with the repo at notebooks/:

Bug, missing feature, or unclear documentation? Email [email protected] or open an issue at github.com/gilblankenship/LungCT_Diagnosis.