Agentic

Optimus is designed to be operated not only by humans but also by AI coding agents. The repository implements a layered agent-readiness architecture that transforms a general-purpose agent into a specialized pipeline operator.

Architecture Overview

Layer	Mechanism	Purpose
Rules	`AGENTS.md`	Mandatory constraints every agent must follow
Skills	`.agents/skills/`	Modular capability packages with progressive context loading
CLI	`uv run cli`	Deterministic, agent-callable maintenance commands
Runner	`DryRunner`	Inspect DAG dependencies without executing nodes

Rules: AGENTS.md

The root-level AGENTS.md file is automatically read by agents (Claude, Codex, etc.) when they operate in the repository. It enforces three critical rules:

Node-Catalog Sync: When editing a node file, always edit the corresponding catalog YAML files to keep them in sync.
Checksum Preservation: Never delete the checksum property in the catalog. If code changes affect output, rerun the node and use the CLI to compute the new checksum.
Downstream Cascade: When editing a node file, rerun all downstream nodes by tracing catalog output IDs through the DAG recursively.

Skills System

Skills are modular, self-contained packages that extend an agent's capabilities with specialized knowledge, workflows, and bundled resources. They follow a progressive disclosure design to minimize context window usage:

Level 1 (Metadata): Name and description in YAML frontmatter. Always in the agent's context, used for triggering.
Level 2 (Instructions): The SKILL.md body. Loaded when the skill triggers.
Level 3 (Resources): Bundled scripts, references, and assets. Loaded on demand.

Skills live under .agents/skills/ and each contains a required SKILL.md file:

SKILL.md

node-catalog-sync

Provides a structured, step-by-step workflow for agents to follow when editing any node file under. It includes:

Path mapping tables linking node outputs to catalog IDs, YAML file paths, and data paths for each layer.
A 4-step sync workflow: (1) identify affected catalog entries, (2) update dataset ID and filepath, (3) rerun the node and sync via CLI, (4) cascade downstream using DryRunner.
Catalog YAML templates showing the expected format for new entries.

This skill is triggered whenever an agent edits code in the pipelines/*/nodes/ directory.

scientific-visualization

A comprehensive skill for creating publication-quality scientific figures. It bundles:

Style presets for Nature, Science, Cell, PLOS, ACS, and IEEE journals
Colorblind-safe palettes (Okabe-Ito, Wong, Paul Tol)
Export utilities with DPI, font embedding, and dimension compliance checks
Working examples for 10+ common plot types (heatmaps, violin plots, scatter with regression, multi-panel figures)
Journal-specific requirements with quick-reference tables

skill-creator (Meta-Skill)

A meta-skill that teaches agents how to create new skills. It provides:

The progressive disclosure framework (3-tier loading system)
A 6-step creation process: understand, plan, initialize, edit, package, iterate
Scaffolding scripts: init_skill.py creates a new skill from templates, package_skill.py validates and bundles it, quick_validate.py checks frontmatter
Design principles: context window as a public good, appropriate degrees of freedom, concise-is-key

This makes the skills system self-extending -- agents can create new skills for new workflows as they encounter them.

Agent-Callable CLI

The CLI is designed to be invoked by agents as part of automated workflows. Key agent-facing commands:

sync-catalog: After an agent reruns a node, it calls this to update the catalog YAML with new schemas and checksums.
checksum: Compute BLAKE2b hashes for verification.
sync-catalog --validate: Check if catalog entries match disk without modifying anything.
sync-catalog --dry-run: Preview changes before applying.

DryRunner

Optimus includes a custom Kedro runner called DryRunner that lists which nodes would run without executing anything. It also checks if all required input datasets exist on disk.

Agents use this to understand the downstream impact of a change:

uv run kedro run --from-nodes silver.drug_disease --runner optimuskg.runners.DryRunner

This returns the list of all nodes that depend (directly or transitively) on the modified node, allowing the agent to cascade changes through the pipeline as required by the AGENTS.md rules.

Agentic

On this page