Troubleshooting
If your first cycgraph workflow doesn’t behave the way you expect, this page lists the errors most users hit early and the misconfigurations that cause them.
Install / startup
Section titled “Install / startup”EBADENGINE Unsupported engine
Section titled “EBADENGINE Unsupported engine”npm warn EBADENGINE Unsupported engine { package: '@cycgraph/orchestrator@0.1.0-beta.X', required: { node: '>=24.0.0' }, current: { node: 'v22.x.x' }}cycgraph requires Node.js 24+. Upgrade Node (e.g. nvm install 24 && nvm use 24) and reinstall.
Cannot find module / missing .js extensions
Section titled “Cannot find module / missing .js extensions”cycgraph packages ship as ES modules. If you import without the explicit .js extension and your tsconfig.json uses "module": "Node16" or "NodeNext", you’ll get module-resolution errors.
// ❌ Will fail in Node16/NodeNextimport { GraphRunner } from '@cycgraph/orchestrator/src/runner/graph-runner';
// ✅ Always import from the package rootimport { GraphRunner } from '@cycgraph/orchestrator';Configuration errors
Section titled “Configuration errors”AgentNotFoundError: Agent "X" not found
Section titled “AgentNotFoundError: Agent "X" not found”You’re running a graph that references an agent_id that wasn’t registered. Register every agent on the InMemoryAgentRegistry (or load via DrizzleAgentRegistry) before instantiating GraphRunner, then call configureAgentFactory(registry).
const registry = new InMemoryAgentRegistry();const RESEARCHER_ID = registry.register({ /* config */ });configureAgentFactory(registry); // ← requiredThis fails closed: when a registry is configured but the agent_id isn’t in it (a typo, a deleted agent, a stale graph), the factory throws rather than substituting a generic default agent. That’s deliberate — the old silent fallback let workflows run to “completed” with deny-all garbage output and real token spend. If you genuinely want the permissive fallback (dev/test), opt in:
configureAgentFactory(registry, { allowDefaultFallback: true });UnsupportedProviderError: Provider "X" is not registered
Section titled “UnsupportedProviderError: Provider "X" is not registered”provider on your agent config doesn’t match a registered provider. Anthropic and OpenAI are built in via createProviderRegistry(); everything else needs explicit registration (registerOllamaProvider, custom factory).
NodeConfigError: <type> node "<id>" is missing <field>
Section titled “NodeConfigError: <type> node "<id>" is missing <field>”You declared a node of a given type but omitted its required config block. The typical culprits:
| Node type | Required field |
|---|---|
agent | agent_id |
supervisor | supervisor_config (or agent_id if supervisor_config.agent_id is unset) |
approval | approval_config |
map | map_reduce_config |
subgraph | subgraph_id + subgraph_config |
voting | voting_config |
evolution | evolution_config |
verifier | verifier_config |
reflection | reflection_config |
tool | tool_id |
Runtime errors
Section titled “Runtime errors”PermissionDeniedError: agent attempted to write key "X"
Section titled “PermissionDeniedError: agent attempted to write key "X"”The agent emitted a save_to_memory call for a key not in the node’s write_keys (or used the _-prefixed reserved namespace). Either:
- Add the key to the node’s
write_keys, or - Update the agent prompt to stop writing it, or
- Use
default_write_keyto channel free-form text output to a specific allowed key.
BudgetExceededError: Token budget exceeded
Section titled “BudgetExceededError: Token budget exceeded”Workflow-wide token budget breached. Either raise state.max_token_budget or, more usefully, add budget per-node so a single runaway call doesn’t eat the run:
{ id: 'reflect', type: 'reflection', // ... budget: { max_tokens: 20_000, max_cost_usd: 0.05 },}NodeBudgetExceededError: Node "X" exceeded max_tokens
Section titled “NodeBudgetExceededError: Node "X" exceeded max_tokens”A single node breached its budget cap. Unlike BudgetExceededError, this one fires per-attempt — retries do not stack toward the cap. Common culprits:
- LLM reflection extractor without
max_factscap. - Annealing loop with a high
max_iterations. - Agent with bloated
toolsarray driving up input tokens.
WorkflowTimeoutError: Workflow ... timed out after Xms
Section titled “WorkflowTimeoutError: Workflow ... timed out after Xms”Wall-clock cap (state.max_execution_time_ms, default 5min) reached. Either raise it or break the work into smaller subgraphs.
NoMatchingEdgeError: node "X" has no outgoing edge whose condition matched
Section titled “NoMatchingEdgeError: node "X" has no outgoing edge whose condition matched”Execution reached a node that isn’t a declared end node, yet none of its outgoing edges’ conditions evaluated true — a dead-end. The usual cause is a filtrex condition that’s always false (a typo’d key name, a comparison against a value that’s never written). This used to silently complete the workflow having run only part of the graph; it now fails loud. Fix the edge condition, add the node to end_nodes if it really is terminal, or — for the legacy silent-completion behavior — set allow_implicit_completion: true on GraphRunnerOptions.
MemoryWriterMissingError: Reflection node "X" requires a memoryWriter
Section titled “MemoryWriterMissingError: Reflection node "X" requires a memoryWriter”A graph contains a reflection node but GraphRunnerOptions.memoryWriter is unset. This is caught by a pre-flight wiring check at the start of run(), so it fails before any node executes rather than mid-run. Wire one up — see Reflection pattern. The same pre-flight check fails the run if a node declares MCP tool sources but no toolResolver is configured.
MCPServerNotFoundError: MCP server "X" not registered
Section titled “MCPServerNotFoundError: MCP server "X" not registered”A node declared tools: [{ type: 'mcp', server_id: 'X' }] but the server isn’t in the MCPServerRegistry. Either call registerDefaultMCPServers() (gives you web-search and fetch) or register your custom servers explicitly.
MCPAccessDeniedError
Section titled “MCPAccessDeniedError”The agent doesn’t have permission for the MCP server in its tools declaration. Check the allowed_agent_ids field on the server’s registry entry.
The “silently wrong” gotchas
Section titled “The “silently wrong” gotchas”These don’t throw — your workflow just behaves differently than you expect.
memoryRetriever wired but never called
Section titled “memoryRetriever wired but never called”The retriever is per-node opt-in. It only fires for nodes that declare a memory_query directive. Without that, the option is silently a no-op.
// ❌ memoryRetriever wired but nothing pulls from itnew GraphRunner(graph, state, { memoryRetriever });
// ✅ Researcher node declares memory_query — retriever fires{ id: 'researcher', type: 'agent', agent_id: RESEARCHER_ID, read_keys: ['goal'], write_keys: ['notes'], memory_query: { tags: ['lesson'], max_facts: 10 },}Reflection extracted facts but no future runs see them
Section titled “Reflection extracted facts but no future runs see them”Almost always one of:
- The reflection node’s
tagsand the consuming node’smemory_query.tagsdon’t match. - The
memoryRetrieveradapter doesn’t passquery.tagsthrough toretrieveMemory()(must includetags: query.tags ?? []). InMemoryMemoryStorewas instantiated per run instead of once for the process — every run starts cold. UseDrizzleMemoryStorefor persistence across runs.
Agent ignores ## Relevant Memory in its prompt
Section titled “Agent ignores ## Relevant Memory in its prompt”The retrieved-memory section is rendered as <memory>...</memory> inside the system prompt, but the agent’s own system prompt has to tell it to use it. Models won’t infer the purpose of that block — write something like "When the prompt contains a '## Relevant Memory' section with prior lessons, honour them..." in the agent system prompt.
Workflow runs forever or hits max_iterations
Section titled “Workflow runs forever or hits max_iterations”A cyclic graph is looping on the same nodes. Common causes:
- Supervisor’s
completion_conditionnever satisfies. - Conditional edge always routes back to a previous node.
max_iterationson supervisor/evolution/annealing is too high relative to the actual convergence.
Use runner.on('supervisor:routed', ...) or the OTel supervisor.route span to see what’s deciding to loop.
Where to dig deeper
Section titled “Where to dig deeper”- Error Handling — full error catalogue and propagation rules.
- Observability / Tracing — wire OpenTelemetry to see what’s actually happening.
- Operations / Deployment — deployment-time errors and Postgres setup.
- Workflow State — what’s in
state.memoryvsstate.supervisor_historyvs the event log.