CLI
Command-line tools for catalog maintenance, checksums, and analysis.
Optimus provides a CLI for pipeline maintenance and analysis tasks. The CLI is separate from Kedro's built-in commands and focuses on catalog management, data validation, and publication figure generation.
uv run cli --helpCommands
checksum
Compute the BLAKE2b hash of a file or directory. Optionally compare against an expected value.
# Compute checksum
uv run cli checksum data/landing/bgee/expression.parquet
# Compare against expected value
uv run cli checksum data/landing/bgee/expression.parquet --expected b7a669ddabfa209eThis is the same BLAKE2b algorithm used by ChecksumHooks to validate catalog entries.
sync-catalog
The primary maintenance command. Synchronizes catalog YAML files with the actual data on disk by updating schemas and checksums.
# Sync all catalog entries
uv run cli sync-catalog
# Sync a specific dataset
uv run cli sync-catalog --dataset bronze.bgee
# Sync an entire layer
uv run cli sync-catalog --layer silver
# Validate without modifying files
uv run cli sync-catalog --validate
# Preview changes without writing
uv run cli sync-catalog --dry-runWhen syncing, the CLI:
- Reads the Parquet file on disk to extract the actual schema
- Updates the
load_args.schemain the catalog YAML to match - Computes the BLAKE2b checksum of the file
- Updates the
metadata.checksumfield in the YAML
The YAML formatting is preserved (comments, ordering, indentation) using regex-based patching rather than full file rewriting.
After modifying a node's code and rerunning it, always use sync-catalog to
update the catalog with the new schema and checksum. See the
agentic page for the full workflow.
metrics
Generate metrics from the gold knowledge graph. Produces summary statistics for node counts, edge counts, and graph properties.
uv run cli metricsfigure
A subcommand group for generating publication-quality figures from the knowledge graph data. Each subcommand produces a specific type of analysis visualization:
# List available figure commands
uv run cli figure --helpAvailable figure types:
| Command | Description |
|---|---|
adjacency-heatmap | Heatmap showing edge counts between node types |
ccdf-degree-distribution | Complementary cumulative degree distribution |
closeness-centrality | Closeness centrality analysis per node type |
degree-distribution | Degree distribution across the graph |
metaedge-bubble-plot | Bubble plot showing metaedge counts and properties |
metapath-counts | Metapath enumeration and counts |
property-type-distribution | Distribution of property types across nodes and edges |