The Cancer Knowledge Graph
At the core of Galen is a knowledge graph containing 280,000+ entities and 4.75M+ relationships, built from 28 biomedical databases and continuously updated by an autonomous research AI.
Entity types
Every node in the knowledge graph is a typed entity. The major entity types include:
geneProtein-coding genes (EGFR, KRAS, TP53, BRCA1...)
drugApproved drugs and investigational compounds
diseaseCancer types and subtypes (NSCLC, AML, TNBC...)
pathwayBiological pathways (MAPK, PI3K/AKT, WNT...)
mutationSpecific variants (EGFR L858R, BRAF V600E...)
proteinProtein products and complexes
cell_lineCancer cell lines from DepMap and GDSC
clinical_trialActive and completed trials
Relationships
Relationships connect entities with typed, directed edges. Each relationship carries provenance (which database it came from), a confidence score, and a Pearl Causal Hierarchy layer annotation (L1, L2, or L3).
Common relationship types include: targets, inhibits, activates, associated_with, mutated_in, co_occurs_with, and many more.
Provenance
Every relationship in the knowledge graph carries provenance metadata indicating which data source(s) contributed the evidence. Provenance values like chembl_36, depmap_crispr, or cbioportal_local tell you exactly where the evidence originated, enabling reproducibility and cross-validation.
Next: Pearl Causal Hierarchy
Every relationship has an L1/L2/L3 annotation. Learn what these mean and why they matter.
Read about causal hierarchy →