Data Model
How ARK CLI represents and stores knowledge graphs.
ARK CLI uses a simple but effective data model for knowledge graphs. Graphs are stored as Parquet files on disk and queried via DuckDB SQL at runtime, with no database server required.
Graph Discovery
At startup, ARK CLI scans the data/ directory for subdirectories containing three required files: graph.json, nodes.parquet, and edges.parquet. Each valid directory becomes an available knowledge graph with its own dedicated AI agent.
Directories missing any of the three files are silently skipped. Graphs are sorted by the order field in their graph.json manifest.
Storage Format
Why Parquet?
Apache Parquet is a columnar storage format that provides:
- Compact storage: Columnar compression (Snappy by default) reduces file sizes significantly compared to CSV or JSON.
- Fast queries: DuckDB can query Parquet files directly without loading them into memory, reading only the relevant columns and row groups.
- Portability: Single files that can be copied, versioned (via Git LFS), and shared.
- Ecosystem support: Supported by every major data processing framework (pandas, polars, Spark, DuckDB, Arrow).
DuckDB Queries
ARK CLI uses DuckDB to execute SQL queries against Parquet files. DuckDB operates in-process (no server) and reads Parquet files lazily. Here is an example of the SQL generated when the agent uses findNodesByName:
SELECT id, name, type, properties
FROM read_parquet('data/primekg/nodes.parquet')
WHERE name ILIKE '%alzheimer%'
LIMIT 10When querying across multiple graphs, the query layer generates UNION ALL expressions that inject a synthetic knowledgeGraphId column:
SELECT 1 AS knowledgeGraphId, id, name, type, properties
FROM read_parquet('data/primekg/nodes.parquet')
UNION ALL
SELECT 2 AS knowledgeGraphId, id, name, type, properties
FROM read_parquet('data/afrimedkg/nodes.parquet')Node Schema
Each row in nodes.parquet represents a single entity in the knowledge graph.
| Column | Type | Description |
|---|---|---|
id | string | Unique node identifier within the graph. |
name | string | Human-readable name. Searched by findNodesByName (case-insensitive partial match). |
type | string | Node category (e.g., "disease", "drug", "gene/protein", "biological_process"). |
properties | string | JSON string with additional metadata. Searched by searchInSurroundings. |
Example Node Properties
The properties field is a JSON string that can contain any additional metadata. Common patterns include:
{
"synonyms": ["ASA", "Acetylsalicylic acid"],
"source": "DrugBank",
"external_ids": { "drugbank": "DB00945" },
"description": "A non-steroidal anti-inflammatory drug"
}Edge Schema
Each row in edges.parquet represents a directed relationship between two nodes.
| Column | Type | Description |
|---|---|---|
from | string | Source node ID. |
to | string | Target node ID. |
type | string | Relationship type (e.g., "treats", "associates_with", "interacts_with"). |
properties | string | JSON string with additional edge metadata. |
Edges are directed (from → to), but the query layer searches both directions when looking for connections between nodes.
Graph Metadata Schema
The graph.json file in each graph directory provides metadata used both at runtime and in the AI agent's system prompt.
| Field | Type | Required | Used For |
|---|---|---|---|
id | number | Yes | Internal identifier, tool scoping |
name | string | Yes | UI display, agent system prompt |
description | string | Yes | Agent system prompt (helps the agent understand the graph) |
shortDescription | string | No | UI selection list |
color | string | No | TUI color indicator |
order | number | No | Display sort order |
category | string | No | Category label |
TypeScript Types
The internal type definitions used by ARK CLI:
type Node = {
knowledgeGraphId: number;
id: string;
name: string | null;
type: string | null;
properties: string | null;
};
type Edge = {
knowledgeGraphId: number;
from: string;
to: string;
type: string | null;
properties: string | null;
};
type KnowledgeGraphMeta = {
id: number;
name: string;
description: string;
category?: string;
shortDescription?: string;
slug: string;
color: string;
order: number;
};