Download
How to get OptimusKG release files and understand the export formats.
OptimusKG is distributed as a set of CSV and Parquet files, available as a release archive from GitHub.
Release Files
Download the latest release:
# Download the release archive
wget https://github.com/mims-harvard/optimus/releases/download/0.56.0/release.zip
# Extract
unzip release.zipThe archive contains the full knowledge graph exported in multiple formats.
File Structure
Each format includes individual files per node and edge type, plus
consolidated nodes and edges files that combine all types.
CSV Format
Node CSV
Columns: id, label, properties
id,label,properties
MONDO:0005015,DIS,"{""sources"":{""direct"":[""opentargets""],""indirect"":[""mondo""]},""name"":""diabetes mellitus"",""description"":""A metabolic disease ...""}"
ENSG00000000003,GEN,"{""sources"":{""direct"":[""opentargets""],""indirect"":[]},""symbol"":""TSPAN6"",""biotype"":""protein_coding""}"Edge CSV
Columns: from, to, label, relation, undirected, properties
from,to,label,relation,undirected,properties
DrugBank:DB00945,MONDO:0005015,DRG-DIS,indication,false,"{""sources"":{""direct"":[""opentargets""],""indirect"":[""chembl""]},""highest_clinical_trial_phase"":4.0}"The properties column is a JSON-encoded string containing all type-specific
properties and provenance metadata.
Parquet Format
Parquet files come in two variants:
-
Individual files (e.g.,
nodes/disease.parquet): Properties are stored as native Polars structs with full typing. This is the most efficient format for analysis, as nested fields can be queried directly without JSON parsing. -
Consolidated files (
nodes.parquet,edges.parquet): Properties are JSON-encoded strings (same as CSV) because different entity types have different property schemas and cannot be stored as a single native struct.
For analysis workflows, prefer the individual Parquet files over consolidated ones. They preserve native types (nested structs, lists, booleans) and are significantly faster to query.
Reading with Polars
import polars as pl
# Read nodes
nodes = pl.read_parquet("release/kg/parquet/nodes.parquet")
# Read edges
edges = pl.read_parquet("relase/kg/parquet/edges.parquet")Neo4j Import
OptimusKG can also be exported to Neo4j using the BioCypher framework. The Neo4j export generates CSV files compatible with neo4j-admin import and bulk-loads them into a Neo4j database.
Neo4j export is available when building OptimusKG from source using the Optimus framework. See the Optimus CLI documentation for details.