Overview
OptimusKG is a biomedical knowledge graph that unifies data from over 15 primary data sources into a single, ontology-grounded labeled property graph. It covers genes, diseases, drugs, phenotypes, exposures, anatomical structures, biological processes, molecular functions, cellular components, and pathways.
OptimusKG is built by the Optimus framework, a production-ready data pipeline for constructing, validating, and maintaining biomedical knowledge graphs.
At a Glance
| Nodes | ~190K across 10 entity types |
| Edges | ~21M across 26 relationship types |
| Data Sources | 15+ direct sources, 40+ indirect sources |
| Relation Types | 30+ standardized relation types |
| Formats | CSV, Parquet, Neo4j |
| License | MIT |
Every node and edge in the graph carries full data provenance, tracking the direct and indirect data sources that contributed to it.