Custom Knowledge Graphs

ARK CLI auto-discovers knowledge graphs from the data/ directory. You can add any knowledge graph that follows the expected format without any code changes.

Directory Structure

Each knowledge graph lives in its own subdirectory under data/ and must contain three files:

graph.json

nodes.parquet

edges.parquet

ARK CLI scans for directories containing all three files at startup. Any directory missing a file is skipped.

The graph.json Manifest

Each graph directory must contain a graph.json file that describes the graph's metadata:

graph.json

{
  "id": 4,
  "name": "My Custom Graph",
  "description": "A detailed description of your knowledge graph. This text is included in the AI agent's system prompt, so be descriptive about what entities and relationships the graph contains.",
  "shortDescription": "Brief one-liner for the selection UI.",
  "color": "#ff6b6b",
  "order": 4,
  "category": "Custom"
}

The description field is included in the AI agent's system prompt to help it understand the graph's contents and purpose. Write it as if you're explaining the graph to a researcher.

For the full field reference, see Graph Metadata Schema.

Parquet File Schemas

The nodes.parquet and edges.parquet files must follow specific column schemas. See Node Schema and Edge Schema in the Data Model reference for the required columns and types.

Creating Parquet Files

You can generate Parquet files from common data formats using Python:

import pandas as pd

nodes = pd.DataFrame({
    "id": ["node_1", "node_2", "node_3"],
    "name": ["Aspirin", "Headache", "COX-2"],
    "type": ["drug", "disease", "gene/protein"],
    "properties": [
        '{"synonyms": ["ASA", "Acetylsalicylic acid"]}',
        '{"icd10": "R51"}',
        '{"full_name": "Cyclooxygenase-2"}'
    ]
})
nodes.to_parquet("data/my-graph/nodes.parquet", index=False)

edges = pd.DataFrame({
    "from": ["node_1", "node_1"],
    "to": ["node_2", "node_3"],
    "type": ["treats", "targets"],
    "properties": ['{}', '{"mechanism": "inhibition"}']
})
edges.to_parquet("data/my-graph/edges.parquet", index=False)

Verifying Your Graph

After adding your graph directory, restart ARK CLI:

pnpm cli

Your graph should appear in the selection list with its own dedicated agent. Select it and try a simple query like:

What types of nodes are in this graph?

Ensure your id field in graph.json does not conflict with existing graph IDs. The bundled graphs use IDs 1, 2, and 3.

Tips

Keep properties as a JSON string, not a nested object. DuckDB parses it at query time.
Use descriptive type values for nodes and edges. The AI agent uses these to filter and reason about the graph.
The name field on nodes is what the agent searches with findNodesByName, so use clear, recognizable names.
Include common synonyms in the properties JSON if your entities have multiple names. The searchInSurroundings tool searches properties as well.
Snappy compression is recommended for Parquet files (it's the default in both pandas and polars).