Skip to content

Tracing

cycgraph includes opt-in OpenTelemetry tracing that gives you full visibility into workflow execution — node timings, LLM calls, supervisor decisions, and tool invocations. When tracing is disabled (the default), all tracing code is a no-op with zero overhead.

Call initTracing() once before any traced code runs:

import { initTracing } from '@cycgraph/orchestrator';
await initTracing('my-app');

Tracing activates when the OTEL_EXPORTER_OTLP_ENDPOINT environment variable is set:

Terminal window
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 node app.js

When the variable is unset, initTracing() returns immediately — OpenTelemetry modules are never imported (dynamic imports), so there is zero bundle or runtime cost.

cycgraph ships with a Jaeger service in Docker Compose:

Terminal window
docker compose up jaeger
open http://localhost:16686

Any OTLP-compatible collector works: Jaeger, Axiom, Honeycomb, Grafana Tempo, LangFuse, or your own.

Every workflow run produces a tree of spans that maps directly to the execution flow:

workflow.run
├── node.execute.supervisor
│ └── supervisor.route (one per routing decision)
├── node.execute.agent
│ └── agent.execute (one per LLM call)
├── node.execute.evolution
│ └── evaluator.evaluate (one per candidate evaluation)
└── node.execute.tool

Each node.execute.* span captures the node ID, type, and run ID. Child spans add execution-specific detail. The workflow.run root span wraps the entire run, and node.execute.* spans fire on both the streaming and non-streaming paths.

Every log line emitted during a run also carries run_id and graph_id automatically (via async-local-storage correlation), so you can grep one run’s logs across the agent executor, MCP, and persistence layers without threading IDs through your own code. Under runner.stream(), the root workflow.run span is established by run(); if you drive stream() directly and want a root span, wrap your consumption loop in your own span — node-level spans and run_id log correlation are present either way.

AttributeTypeDescription
workflow.idstringWorkflow ID
graph.idstringGraph definition ID
graph.namestringGraph name
run.idstringUnique run ID
workflow.duration_msnumberTotal wall-clock duration
workflow.statusstringFinal status (completed, failed, etc.)
workflow.iterationsnumberTotal graph iterations executed
AttributeTypeDescription
agent.idstringAgent UUID
agent.modelstringModel ID (e.g. claude-sonnet-4-20250514)
agent.providerstringProvider name (e.g. anthropic)
agent.attemptnumberRetry attempt (1 = first try)
agent.duration_msnumberLLM call duration
agent.tokens.inputnumberInput tokens consumed
agent.tokens.outputnumberOutput tokens generated
agent.tokens.totalnumberTotal tokens
agent.tools_callednumberNumber of tool invocations
agent.errorstringError message (on failure only)
AttributeTypeDescription
supervisor.idstringSupervisor node ID
supervisor.decisionstringChosen next node (or __done__)
supervisor.reasoningstringLLM’s explanation for the routing choice
supervisor.iterationnumberCurrent supervisor iteration
supervisor.input_tokensnumberInput tokens consumed
supervisor.output_tokensnumberOutput tokens generated
AttributeTypeDescription
evaluator.agent_idstringEvaluator agent UUID
evaluator.scorenumberQuality score (0.0–1.0)
evaluator.tokensnumberTotal tokens consumed

If you build custom node executors or utilities, you can create spans using the exported helpers:

import { getTracer, withSpan } from '@cycgraph/orchestrator';
const tracer = getTracer('my-custom-module');
const result = await withSpan(tracer, 'my.operation', async (span) => {
span.setAttribute('my.custom_attr', 'value');
// ... your logic ...
return someResult;
});

withSpan automatically:

  • Creates a child span under the current async context
  • Sets span status to OK on success
  • Sets span status to ERROR and records the exception on failure
  • Ends the span in a finally block (guaranteed cleanup)

getTracer() returns a no-op tracer when OpenTelemetry is not initialized, so your code works identically with or without tracing enabled.

initTracing() also initializes an optional metrics subsystem (gated separately by METRICS_ENABLED=true). Built-in metric recording functions:

FunctionWhat it records
recordWorkflowDuration(ms)Workflow wall-clock time
recordTokensUsed(count)Token consumption
recordCostUsd(amount)Dollar cost
recordAgentDuration(ms)Per-agent LLM call time
incrementWorkflowsStarted()Workflow start counter
incrementWorkflowsCompleted()Workflow completion counter
incrementWorkflowsFailed()Workflow failure counter

All metric functions accept optional labels and are zero-cost no-ops when metrics are disabled.

initTracing() registers SIGTERM and SIGINT handlers that flush pending spans and shut down the SDK cleanly. No additional cleanup code is needed.

  • Evaluations — verify agent behavior with automated eval suites
  • Streaming — real-time event observability (alternative to spans)