Overview

Optimus is a production-ready data pipeline framework designed to construct, validate, and maintain biomedical knowledge graphs following software engineering best practices.

Optimus is the framework that builds knowledge graphs. OptimusKG is the biomedical knowledge graph data product built using Optimus.

Optimus is grounded in three core principles:

Ready-to-use: Pre-built processing nodes that unify many biomedical data sources into a single knowledge graph. Run one command to build the entire graph.
Reproducible: All transformations are deterministic, validated through checksum verification, and infrastructure-agnostic. Every dataset is declaratively specified in version-controlled YAML.
Extensible: Built as a superset of the Kedro framework (hosted by the Linux Foundation), providing a uniform project template, data abstraction, configuration management, and pipeline assembly.

Beyond these principles, Optimus is also AI-ready: the repository ships with a skills system, agent rules, and CLI tools that allow AI coding agents to operate within the pipeline autonomously.

Architectural Components

Optimus organizes its pipeline around nine core components:

Component	Description
Catalog	The single source of truth for all datasets, their schemas, checksums, and metadata
Dataset	Typed abstractions for reading and writing data (Parquet, JSON, OWL, ZIP, SQL dumps)
Node	Pure Python functions that transform data, grouped into pipelines
Pipeline	Directed acyclic graphs (DAGs) of nodes, organized into medallion layers
Layer	Medallion architecture tiers: landing, bronze, silver, gold
Parameters	Runtime configuration values (export formats, feature flags)
Provider	Data download strategies for different sources (HTTP, FTP, APIs)
Hook	Lifecycle interceptors for downloading, checksum validation, and quality checks
Conf	Configuration directory with base environments and OmegaConf integration

Overview

Architectural Components

Continue Reading

Architecture

Catalog & Datasets

Hooks & Providers

CLI

Agentic

On this page