Skip to content

Evolution (DGM)

The Evolution pattern — inspired by Darwin Gödel Machines — runs multiple candidate solutions in parallel, scores each with a fitness evaluator, selects the best, and breeds the next generation using the winner’s output as context.

The loop continues across multiple generations until a fitness threshold is met or a stagnation condition is reached. The LLM itself acts as the mutation operator: each candidate receives the winning parent in its prompt alongside a temperature that decreases generation by generation, producing controlled variation that converges over time.

flowchart TB
    Start([Start]) --> G0

    subgraph G0["Generation 0"]
        direction LR
        A0["Candidate A"] ~~~ B0["Candidate B"] ~~~ C0["Candidate C"]
    end

    G0 --> Eval0["Evaluate All → Winner (0.72)"]
    Eval0 --> G1

    subgraph G1["Generation 1"]
        direction LR
        A1["Candidate A'"] ~~~ B1["Candidate B'"] ~~~ C1["Candidate C'"]
    end

    G1 --> Eval1["Evaluate All → Winner (0.85)"]
    Eval1 --> G2

    subgraph G2["Generation 2"]
        direction LR
        A2["Candidate A''"] ~~~ B2["Candidate B''"] ~~~ C2["Candidate C''"]
    end

    G2 --> Eval2["Evaluate All → Winner (0.91) ✓ Done"]
    Eval2 --> Done([Done])

Each generation follows a strict loop:

  1. N candidates run in parallel (fan-out).
  2. Each candidate receives the previous generation’s winner injected into its prompt.
  3. A fitness evaluator agent scores each candidate on a 0–1 scale.
  4. The highest-scoring candidate becomes the parent for the next generation.
  5. Temperature decreases linearly (moving from broad exploration to focused exploitation).
  6. Execution halts when the fitness threshold is met, stagnation is detected, or max generations are reached.
  • Creative problem solving: When there are many wildly different valid approaches and you want to explore the landscape simultaneously.
  • Prompt optimization: Allowing an LLM to rewrite its own prompt instructions iteratively to find the highest-performing variant.
  • Out-of-the-box solutions: Finding non-obvious solutions where a single, sequential self-annealing agent might get stuck in a local maximum.

(Note: Evolution is resource intensive. If you only need to iteratively refine a single output until it hits a quality bar, use Self-Annealing instead.)

The pattern requires you to pair a “candidate” generator agent with an “evaluator” agent within an evolution node.

Register the candidate agent that will generate variations, and the evaluator agent that will score their fitness.

import { InMemoryAgentRegistry } from '@cycgraph/orchestrator';
const registry = new InMemoryAgentRegistry();
const WRITER_ID = registry.register({
name: 'Candidate Writer',
model: 'claude-sonnet-4-20250514',
provider: 'anthropic',
system_prompt: [
'You are a creative writer.',
'Write a poem based on the prompt.',
'If `_evolution_parent` is provided, use it as a starting point. The parent scored `_evolution_parent_fitness`—aim to do better.',
'Current generation: `_evolution_generation`.',
].join(' '),
// Temperature is overridden by the evolution node dynamically
temperature: 1.0,
tools: [],
permissions: { read_keys: ['prompt'], write_keys: ['poem'] },
});
const EVALUATOR_ID = registry.register({
name: 'Fitness Evaluator',
model: 'claude-sonnet-4-20250514',
provider: 'anthropic',
system_prompt: [
'Evaluate the poem strictly on its metrical structure and emotional impact.',
'Return a single number between 0.0 and 1.0 representing the quality score.',
].join(' '),
temperature: 0.1,
tools: [],
permissions: { read_keys: ['poem'], write_keys: ['score'] },
});

The evolution node type requires an evolution_config block that dictates the population size, selection strategy, and stopping conditions.

import { createGraph } from '@cycgraph/orchestrator';
const graph = createGraph({
name: 'Poem Evolution',
nodes: [
{
id: 'evolve-poem',
type: 'evolution',
read_keys: ['*'],
write_keys: ['*'],
evolution_config: {
candidate_agent_id: WRITER_ID,
evaluator_agent_id: EVALUATOR_ID,
population_size: 5, // Parallel candidates per generation
max_generations: 10, // Hard limit
fitness_threshold: 0.9, // Early exit score
stagnation_generations: 3, // Exit if no improvement
selection_strategy: 'rank',// Always select the top scorer
initial_temperature: 1.0, // Exploration (Generation 0)
final_temperature: 0.3, // Exploitation (Final Generation)
},
},
],
edges: [],
start_node: 'evolve-poem',
end_nodes: ['evolve-poem'],
});

Each candidate receives the previous generation’s winner automatically in its state view. Your candidate agent’s system prompt must explicitly address these variables to “mutate” successfully:

“If _evolution_parent is provided, use it as a starting point. The parent scored _evolution_parent_fitness—aim to do better. Current generation: _evolution_generation.”

The evaluator’s critique of the parent is also injected as _evolution_parent_reasoning — feed it to your candidate prompt so each generation fixes the specific gaps the judge named, rather than mutating blindly.

elite_count (default 1) carries the top N candidates of each generation forward unchanged — not re-generated, not re-scored. This guarantees the best-so-far can never be lost to a noisy generation, so ${nodeId}_fitness_history is monotonic (it climbs or holds, never dips), and it saves the LLM calls those slots would have cost (each generation after the first issues population_size - elite_count candidate calls). Set elite_count: 0 to breed every candidate fresh instead.

Evolution executes many LLM calls. With a population size of 5 and max generations of 10, you trigger up to 50 candidate executions plus 50 evaluations — easily 100x the cost of a single-shot generation.

Both candidate generation and evaluator scoring run in parallel, bounded by max_concurrency — a generation takes roughly one evaluation’s wall-clock rather than scoring candidates one at a time.

Two safeguards keep this manageable:

  • Set error_strategy: 'best_effort' so a single API failure within a generation doesn’t kill the entire run.
  • Set a conservative fitness_threshold and stagnation_generations so the loop exits as soon as quality plateaus.

The node writes ${nodeId}_winner (the best candidate’s full output), ${nodeId}_winner_fitness, ${nodeId}_winner_reasoning, ${nodeId}_fitness_history, and ${nodeId}_population. Note that _population holds per-candidate fitness summaries (index, fitness, reasoning, tokens_used) — not every candidate’s full output — to keep state and checkpoints small. The winning output is available in full under _winner.