Skip to content

Tracing

cycgraph includes opt-in OpenTelemetry tracing that gives you full visibility into workflow execution — node timings, LLM calls, supervisor decisions, and tool invocations. When tracing is disabled (the default), all tracing code is a no-op with zero overhead.

Call initTracing() once before any traced code runs:

import { initTracing } from '@cycgraph/orchestrator';
await initTracing('my-app');

Tracing activates when the OTEL_EXPORTER_OTLP_ENDPOINT environment variable is set:

Terminal window
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 node app.js

When the variable is unset, initTracing() returns immediately — OpenTelemetry modules are never imported (dynamic imports), so there is zero bundle or runtime cost.

cycgraph ships with a Jaeger service in Docker Compose:

Terminal window
docker compose up jaeger
open http://localhost:16686

Any OTLP-compatible collector works: Jaeger, Axiom, Honeycomb, Grafana Tempo, LangFuse, or your own.

Every workflow run produces a tree of spans that maps directly to the execution flow:

workflow.run
├── node.execute.supervisor
│ └── supervisor.route (one per routing decision)
├── node.execute.agent
│ └── agent.execute (one per LLM call)
├── node.execute.evolution
│ └── evaluator.evaluate (one per candidate evaluation)
└── node.execute.tool

Each node.execute.* span captures the node ID and type. Child spans add execution-specific detail.

AttributeTypeDescription
workflow.idstringWorkflow ID
graph.idstringGraph definition ID
graph.namestringGraph name
run.idstringUnique run ID
workflow.duration_msnumberTotal wall-clock duration
workflow.statusstringFinal status (completed, failed, etc.)
workflow.iterationsnumberTotal graph iterations executed
AttributeTypeDescription
agent.idstringAgent UUID
agent.modelstringModel ID (e.g. claude-sonnet-4-20250514)
agent.providerstringProvider name (e.g. anthropic)
agent.attemptnumberRetry attempt (1 = first try)
agent.duration_msnumberLLM call duration
agent.tokens.inputnumberInput tokens consumed
agent.tokens.outputnumberOutput tokens generated
agent.tokens.totalnumberTotal tokens
agent.tools_callednumberNumber of tool invocations
agent.errorstringError message (on failure only)
AttributeTypeDescription
supervisor.idstringSupervisor node ID
supervisor.decisionstringChosen next node (or __done__)
supervisor.reasoningstringLLM’s explanation for the routing choice
supervisor.iterationnumberCurrent supervisor iteration
supervisor.input_tokensnumberInput tokens consumed
supervisor.output_tokensnumberOutput tokens generated
AttributeTypeDescription
evaluator.agent_idstringEvaluator agent UUID
evaluator.scorenumberQuality score (0.0–1.0)
evaluator.tokensnumberTotal tokens consumed

If you build custom node executors or utilities, you can create spans using the exported helpers:

import { getTracer, withSpan } from '@cycgraph/orchestrator';
const tracer = getTracer('my-custom-module');
const result = await withSpan(tracer, 'my.operation', async (span) => {
span.setAttribute('my.custom_attr', 'value');
// ... your logic ...
return someResult;
});

withSpan automatically:

  • Creates a child span under the current async context
  • Sets span status to OK on success
  • Sets span status to ERROR and records the exception on failure
  • Ends the span in a finally block (guaranteed cleanup)

getTracer() returns a no-op tracer when OpenTelemetry is not initialized, so your code works identically with or without tracing enabled.

initTracing() also initializes an optional metrics subsystem (gated separately by METRICS_ENABLED=true). Built-in metric recording functions:

FunctionWhat it records
recordWorkflowDuration(ms)Workflow wall-clock time
recordTokensUsed(count)Token consumption
recordCostUsd(amount)Dollar cost
recordAgentDuration(ms)Per-agent LLM call time
incrementWorkflowsStarted()Workflow start counter
incrementWorkflowsCompleted()Workflow completion counter
incrementWorkflowsFailed()Workflow failure counter

All metric functions accept optional labels and are zero-cost no-ops when metrics are disabled.

initTracing() registers SIGTERM and SIGINT handlers that flush pending spans and shut down the SDK cleanly. No additional cleanup code is needed.

  • Evaluations — verify agent behavior with automated eval suites
  • Streaming — real-time event observability (alternative to spans)