Grence

Data Model

How ARK CLI represents and stores knowledge graphs.

ARK CLI uses a simple but effective data model for knowledge graphs. Graphs are stored as Parquet files on disk and queried via DuckDB SQL at runtime, with no database server required.

Graph Discovery

At startup, ARK CLI scans the data/ directory for subdirectories containing three required files: graph.json, nodes.parquet, and edges.parquet. Each valid directory becomes an available knowledge graph with its own dedicated AI agent.

graph.json
nodes.parquet
edges.parquet

Directories missing any of the three files are silently skipped. Graphs are sorted by the order field in their graph.json manifest.

Storage Format

Why Parquet?

Apache Parquet is a columnar storage format that provides:

  • Compact storage: Columnar compression (Snappy by default) reduces file sizes significantly compared to CSV or JSON.
  • Fast queries: DuckDB can query Parquet files directly without loading them into memory, reading only the relevant columns and row groups.
  • Portability: Single files that can be copied, versioned (via Git LFS), and shared.
  • Ecosystem support: Supported by every major data processing framework (pandas, polars, Spark, DuckDB, Arrow).

DuckDB Queries

ARK CLI uses DuckDB to execute SQL queries against Parquet files. DuckDB operates in-process (no server) and reads Parquet files lazily. Here is an example of the SQL generated when the agent uses findNodesByName:

SELECT id, name, type, properties
FROM read_parquet('data/primekg/nodes.parquet')
WHERE name ILIKE '%alzheimer%'
LIMIT 10

When querying across multiple graphs, the query layer generates UNION ALL expressions that inject a synthetic knowledgeGraphId column:

SELECT 1 AS knowledgeGraphId, id, name, type, properties
FROM read_parquet('data/primekg/nodes.parquet')
UNION ALL
SELECT 2 AS knowledgeGraphId, id, name, type, properties
FROM read_parquet('data/afrimedkg/nodes.parquet')

Node Schema

Each row in nodes.parquet represents a single entity in the knowledge graph.

ColumnTypeDescription
idstringUnique node identifier within the graph.
namestringHuman-readable name. Searched by findNodesByName (case-insensitive partial match).
typestringNode category (e.g., "disease", "drug", "gene/protein", "biological_process").
propertiesstringJSON string with additional metadata. Searched by searchInSurroundings.

Example Node Properties

The properties field is a JSON string that can contain any additional metadata. Common patterns include:

{
  "synonyms": ["ASA", "Acetylsalicylic acid"],
  "source": "DrugBank",
  "external_ids": { "drugbank": "DB00945" },
  "description": "A non-steroidal anti-inflammatory drug"
}

Edge Schema

Each row in edges.parquet represents a directed relationship between two nodes.

ColumnTypeDescription
fromstringSource node ID.
tostringTarget node ID.
typestringRelationship type (e.g., "treats", "associates_with", "interacts_with").
propertiesstringJSON string with additional edge metadata.

Edges are directed (fromto), but the query layer searches both directions when looking for connections between nodes.

Graph Metadata Schema

The graph.json file in each graph directory provides metadata used both at runtime and in the AI agent's system prompt.

FieldTypeRequiredUsed For
idnumberYesInternal identifier, tool scoping
namestringYesUI display, agent system prompt
descriptionstringYesAgent system prompt (helps the agent understand the graph)
shortDescriptionstringNoUI selection list
colorstringNoTUI color indicator
ordernumberNoDisplay sort order
categorystringNoCategory label

TypeScript Types

The internal type definitions used by ARK CLI:

type Node = {
  knowledgeGraphId: number;
  id: string;
  name: string | null;
  type: string | null;
  properties: string | null;
};

type Edge = {
  knowledgeGraphId: number;
  from: string;
  to: string;
  type: string | null;
  properties: string | null;
};

type KnowledgeGraphMeta = {
  id: number;
  name: string;
  description: string;
  category?: string;
  shortDescription?: string;
  slug: string;
  color: string;
  order: number;
};

On this page