Agentic
AI-ready features, skills system, and agent integration in Optimus.
Optimus is designed to be operated not only by humans but also by AI coding agents. The repository implements a layered agent-readiness architecture that transforms a general-purpose agent into a specialized pipeline operator.
Architecture Overview
| Layer | Mechanism | Purpose |
|---|---|---|
| Rules | AGENTS.md | Mandatory constraints every agent must follow |
| Skills | .agents/skills/ | Modular capability packages with progressive context loading |
| CLI | uv run cli | Deterministic, agent-callable maintenance commands |
| Runner | DryRunner | Inspect DAG dependencies without executing nodes |
Rules: AGENTS.md
The root-level AGENTS.md file is automatically read by agents (Claude, Codex, etc.) when they operate in the repository. It enforces three critical rules:
- Node-Catalog Sync: When editing a node file, always edit the corresponding catalog YAML files to keep them in sync.
- Checksum Preservation: Never delete the
checksumproperty in the catalog. If code changes affect output, rerun the node and use the CLI to compute the new checksum. - Downstream Cascade: When editing a node file, rerun all downstream nodes by tracing catalog output IDs through the DAG recursively.
Skills System
Skills are modular, self-contained packages that extend an agent's capabilities with specialized knowledge, workflows, and bundled resources. They follow a progressive disclosure design to minimize context window usage:
- Level 1 (Metadata): Name and description in YAML frontmatter. Always in the agent's context, used for triggering.
- Level 2 (Instructions): The SKILL.md body. Loaded when the skill triggers.
- Level 3 (Resources): Bundled scripts, references, and assets. Loaded on demand.
Skills live under .agents/skills/ and each contains a required SKILL.md file:
node-catalog-sync
Provides a structured, step-by-step workflow for agents to follow when editing any node file under. It includes:
- Path mapping tables linking node outputs to catalog IDs, YAML file paths, and data paths for each layer.
- A 4-step sync workflow: (1) identify affected catalog entries, (2) update dataset ID and filepath, (3) rerun the node and sync via CLI, (4) cascade downstream using DryRunner.
- Catalog YAML templates showing the expected format for new entries.
This skill is triggered whenever an agent edits code in the
pipelines/*/nodes/ directory.
scientific-visualization
A comprehensive skill for creating publication-quality scientific figures. It bundles:
- Style presets for Nature, Science, Cell, PLOS, ACS, and IEEE journals
- Colorblind-safe palettes (Okabe-Ito, Wong, Paul Tol)
- Export utilities with DPI, font embedding, and dimension compliance checks
- Working examples for 10+ common plot types (heatmaps, violin plots, scatter with regression, multi-panel figures)
- Journal-specific requirements with quick-reference tables
skill-creator (Meta-Skill)
A meta-skill that teaches agents how to create new skills. It provides:
- The progressive disclosure framework (3-tier loading system)
- A 6-step creation process: understand, plan, initialize, edit, package, iterate
- Scaffolding scripts:
init_skill.pycreates a new skill from templates,package_skill.pyvalidates and bundles it,quick_validate.pychecks frontmatter - Design principles: context window as a public good, appropriate degrees of freedom, concise-is-key
This makes the skills system self-extending -- agents can create new skills for new workflows as they encounter them.
Agent-Callable CLI
The CLI is designed to be invoked by agents as part of automated workflows. Key agent-facing commands:
sync-catalog: After an agent reruns a node, it calls this to update the catalog YAML with new schemas and checksums.checksum: Compute BLAKE2b hashes for verification.sync-catalog --validate: Check if catalog entries match disk without modifying anything.sync-catalog --dry-run: Preview changes before applying.
DryRunner
Optimus includes a custom Kedro runner called DryRunner that lists which nodes would run without executing anything. It also checks if all required input datasets exist on disk.
Agents use this to understand the downstream impact of a change:
uv run kedro run --from-nodes silver.drug_disease --runner optimuskg.runners.DryRunnerThis returns the list of all nodes that depend (directly or transitively) on the modified node, allowing the agent to cascade changes through the pipeline as required by the AGENTS.md rules.