Using the Context Engine
This guide covers the practical steps for adding context compression to a workflow. For background on pipeline architecture, scoring algorithms, and budget management, see Context Engine.
Quick start
Section titled “Quick start”The fastest way to compress context in a workflow:
import { GraphRunner } from '@cycgraph/orchestrator';import { createOptimizedPipeline, serialize } from '@cycgraph/context-engine';
const { pipeline } = createOptimizedPipeline({ preset: 'balanced' });
const contextCompressor = (sanitizedMemory, options) => { const result = pipeline.compress({ segments: [{ id: 'memory', content: serialize(sanitizedMemory), role: 'memory', priority: 1, }], budget: { maxTokens: options?.maxTokens ?? 8192, outputReserve: 0 }, model: options?.model, }); return { compressed: result.segments[0].content, metrics: result.metrics };};
const runner = new GraphRunner(graph, state, { contextCompressor });
runner.on('context:compressed', (event) => { console.log(`Memory: ${event.reduction_percent.toFixed(1)}% reduction`);});Choosing a preset
Section titled “Choosing a preset”| Scenario | Preset | Why |
|---|---|---|
| Low-latency chat | fast | Minimal overhead, format + dedup only |
| General workflows | balanced | Good compression with heuristic pruning |
| Cost-sensitive / small models | maximum | Full pipeline with hierarchy formatting |
Multi-turn compression
Section titled “Multi-turn compression”For workflows with multiple turns, use the incremental pipeline to avoid re-compressing unchanged context:
import { createIncrementalPipeline, createFormatStage, createExactDedupStage } from '@cycgraph/context-engine';
const pipeline = createIncrementalPipeline({ stages: [createFormatStage(), createExactDedupStage()],});
let state = undefined;
for (const turn of turns) { const { result, state: nextState, cachedSegmentCount } = pipeline.compress( { segments: buildSegments(turn), budget }, state, ); state = nextState; console.log(`Turn ${nextState.turnNumber}: ${cachedSegmentCount} segments cached`);}The incremental pipeline tracks per-segment output hashes, so cross-segment stages (like fuzzy dedup) only re-run when per-segment outputs actually change — not just when inputs change. This avoids expensive re-runs when a segment’s content changes but its compressed output stays the same.
Pipeline safety
Section titled “Pipeline safety”Timeout
Section titled “Timeout”Set a pipeline-level timeout to bound total compression time. Remaining stages are skipped if exceeded:
const pipeline = createPipeline({ stages: [...], timeoutMs: 200, // hard cap at 200ms});Logger
Section titled “Logger”Route diagnostic output through a structured logger:
const pipeline = createPipeline({ stages: [...], logger: { warn: (msg) => myLogger.warn(msg), debug: (msg) => myLogger.debug(msg), },});Query-aware compression
Section titled “Query-aware compression”When the user’s query is known, configure the heuristic scorer to weight tokens that match the query, so query-relevant content survives pruning at the expense of unrelated text:
import { createPipeline, createHeuristicPruningStage, createAllocatorStage } from '@cycgraph/context-engine';
const pipeline = createPipeline({ stages: [ createHeuristicPruningStage({ queryWeight: 0.25 }), createAllocatorStage(), ],});
const result = pipeline.compress({ segments: [ { id: 'query', content: userQuery, role: 'query', priority: 10, locked: true }, { id: 'memory', content: serialize(memory), role: 'memory', priority: 5 }, ], budget: { maxTokens: 4096, outputReserve: 512 },});Mark the query segment as locked: true so it is never pruned — the heuristic scorer reads its tokens to compute relevance scores for the unlocked segments. queryWeight is a multiplier between 0 and 1; higher values bias the scorer more heavily toward query-matching content.
Working with memory payloads
Section titled “Working with memory payloads”When compressing memory from @cycgraph/memory, use the adaptive memory stage to prioritize recent and high-relevance facts:
import { createPipeline, createAdaptiveMemoryStage, createFormatStage, createAllocatorStage, serialize,} from '@cycgraph/context-engine';
const pipeline = createPipeline({ stages: [ createAdaptiveMemoryStage({ recencyBoostDays: 7, maxFactsPerTheme: 10 }), createFormatStage(), createAllocatorStage(), ],});
// Serialize memory retrieval result to JSONconst memoryJson = serialize(memoryResult);
const result = pipeline.compress({ segments: [ { id: 'system', content: systemPrompt, role: 'system', priority: 10, locked: true }, { id: 'memory', content: memoryJson, role: 'memory', priority: 5 }, { id: 'history', content: chatHistory, role: 'history', priority: 3 }, ], budget: { maxTokens: 4096, outputReserve: 1024 },});Monitoring compression
Section titled “Monitoring compression”Pipeline metrics
Section titled “Pipeline metrics”Every compression call returns detailed metrics:
const { metrics } = result;console.log(`Total: ${metrics.totalTokensIn} -> ${metrics.totalTokensOut} tokens`);console.log(`Reduction: ${metrics.reductionPercent.toFixed(1)}%`);console.log(`Duration: ${metrics.totalDurationMs.toFixed(0)}ms`);
for (const stage of metrics.stages) { console.log(` ${stage.name}: ${stage.ratio.toFixed(2)}x (${stage.durationMs.toFixed(0)}ms)`);}Cache diagnostics
Section titled “Cache diagnostics”Detect when API prompt caching is being invalidated by dynamic content:
import { diagnoseCacheStability, computeSegmentHashMap } from '@cycgraph/context-engine';
// Track hashes between turnsconst hashes = computeSegmentHashMap(segments);const diagnostics = diagnoseCacheStability(segments, previousHashes);
if (diagnostics.hitRate < 0.8) { console.warn('Low cache hit rate:', diagnostics.recommendations);}Circuit breaker
Section titled “Circuit breaker”Wrap expensive stages to auto-bypass when they aren’t paying for themselves:
import { createCircuitBreaker, createLatencyTracker } from '@cycgraph/context-engine';
const tracker = createLatencyTracker();const guarded = createCircuitBreaker(semanticDedupStage, tracker, { minEfficiency: 1.0, // must save 1 token per ms of latency warmupSamples: 5, cooldownMs: 30_000,});Next steps
Section titled “Next steps”- Context Engine — architectural deep dive
- Memory System — the knowledge graph that feeds the context engine
- Budget-Aware Model Selection — how model choice affects compression