AI Workflows
Auto-Optimized
Observe — Evaluate — Improve

Epigentic transforms your AI workflows with automatic feedback-driven refinement

From Feedback to Evolution

Epigentic doesn't just evaluate, it transforms every run into a better system

1. Define your workflow

Define your workflow using your favorite AI framework. The program flow and configurations such as prompt templates are tracked for the optimization phase

story_agent.py

import asyncio
from agents import Agent, Runner
from epigentic import track, TrackingContext, AgentTracker, expand_template

@track()
async def write_story(ctx: TrackingContext, input: str) -> str:
    """
    Writes a story with a robust plot and a main character with
    a rich background based on the input.
    """
    background = await get_character_background(ctx, input)

    story_writer = Agent(
        name="write_story",
        instructions=expand_template(
            ctx.get_template_config("template")["value"],
            {
                "input": input,
                "plot": plot,
                "background_description": background,
            },
        ),
        model=ctx.get_string_config("model"),
        hooks=AgentTracker(ctx),
    )

    result = await Runner.run(story_writer, input)
    return result.final_output


@track()
async def get_character_background(ctx: TrackingContext, input: str) -> str:
    """
    Creates the background information for the input character.
    """
    agent = Agent(
        name="get_character_background",
        instructions=expand_template(
            ctx.get_template_config("template")["value"], {"input": input}
        ),
        model=ctx.get_string_config("model"),
        hooks=AgentTracker(ctx),
    )
    result = await Runner.run(agent, input)
    return result.final_output

2. Write your Evaluation Suite

Define your evaluation suite for evaluating and optimizing your workflow evaluating multiple agents acrosss multiple evaluators simultaneously

story_agent.py

from typing import Any, Dict, Optional
from epigentic import (Case, Criteria, DataRow, DataSet,
                       Suite, llm_as_judge_evaluator)

                       def is_of_correct_length_evaluator(num_paragraphs: int):
    return llm_as_judge_evaluator(
        id="is_of_correct_length_evaluator",
        description="Evaluate whether the output has the correct number of paragraphs",
        input_and_output_to_template_values=lambda input, output, params: {
            "output": output,
            "num_paragraphs": params["num_paragraphs"],
        },
        params={"num_paragraphs": num_paragraphs},
    )

Suite("write_story").add(
    Case[str, str, None](
        name="case_1",
        function=write_story,
        dataset=DataSet(
            data=[DataRow(input="a corgi dog", specs=None)]
        ),
        criteria=[
            Criteria(
                target_agent_name_regex=write_story.__name__,
                evaluator=is_of_correct_length_evaluator(5),
                threshold=1,
            ),
            Criteria(
                target_agent_name_regex=get_character_background.__name__,
                evaluator=is_of_correct_length_evaluator(3),
                threshold=1,
            ),
        ],
    )
).run()

3. Evaluate your workflow

Evaluate your workflow to see which criteria pass and which fail

$ python example.py evaluate               
Hydration and evaluation:  [Elapsed time: 0:00:41] ████████████████████████████ ETA:  00:00:00)
...
Running evaluator: is_of_correct_length_evaluator
Target: write_story

Running evaluator: is_of_correct_length_evaluator
Target: get_character_background
...
Evaluation of: write_story
FAILED case_1

Passed: 0, Failed: 1, Skipped: 0

View the evaluation:
https://epigenticai.com/app/evaluations?evaluation=3f90f2cd-b877-4cf1-bd73-33effe2a6696

4. View the Results

See which criteria passed and failed and analyze the workflows complete data flow

4. Add Feedback

Add manual feedback to critique any part of the workflow

5. Auto-Optimize

Run optimization to automatically improve the agent based on feedback

$ python example.py optimize --from 3f90f2cd-b877-4cf1-bd73-33effe2a6696
Optimizing case 1 of 1
Starting function write_story
...
Optimizing sub function: write_story
* Feedback: The main character's name needs to be Cheddar if it is a Corgi dog.
* Failed score reasons: The story contains 22 paragraphs. For a score of 1, the story needed to be 2 or 3 paragraphs long. Since 22 is 4 or more paragraphs, the score is 0.
Getting failed scores
Getting feedback items
...
Thinking: Here are the problems to address: (1) The story output is much too long (22 paragraphs, rather than 2-3)
and (2) the main character must be named Cheddar if it's a Corgi dog. These issues should be resolved by updating
the prompt template for the write_story function. I'll edit the write_story template so that it explicitly requests
a story of only 2-3 paragraphs and ensures the main character is named Cheddar if the story is about a Corgi. Then,
I'll re-evaluate to see if both issues are fixed.
...
Updating configuration for function write_story
...
Re-evaluating the function get_background_description
...
For get_character_background, updating the template configuration
For get_story, updating the template configuration
Optimization complete
View the optimizer tracks at https://agenticai.com/app/tracks?id=f652f995-635b-42f1-b68e-5bd0ad738ec4

6. Verify Success

Re-run evaluation to confirm the agent now passes all criteria

$ python example.py evaluate               
Hydration and evaluation:  [Elapsed time: 0:00:32] ████████████████████████████ ETA:  00:00:00)
...
Running evaluator: is_of_correct_length_evaluator
Target: write_story

Running evaluator: is_of_correct_length_evaluator
Target: get_character_background
...
Evaluation of: write_story
PASSED case_1

Passed: 1, Failed: 0, Skipped: 0

View the evaluation:
https://epigenticai.com/app/evaluations?evaluation=2e70f3ef-c562-1af2-ca72-a1bcde57bae2

Template Evolution

Templates auto-optimized to meet requirements and adapt to feedback

Before

write_story_initial.txt

Write a story about a [[input]].

↓

After

write_story_optimized.txt

Write a story about a [[input]].

Use the following plot:
<plot>
[[plot]]
</plot>
And background description of the main character:
<background_description>
[[background_description]]
</background_description>

IMPORTANT: The story must be exactly 2 or 3 paragraphs long—do NOT write more than 3
paragraphs total. If the main character is a Corgi dog, its name must be Cheddar. The
story should make this explicit and use this name for the Corgi main character.

Before

write_story_initial.txt

Write a story about a [[input]].

↓

After

write_story_optimized.txt

Write a story about a [[input]].

Use the following plot:
<plot>
[[plot]]
</plot>
And background description of the main character:
<background_description>
[[background_description]]
</background_description>

IMPORTANT: The story must be exactly 2 or 3 paragraphs long—do NOT write more than 3
paragraphs total. If the main character is a Corgi dog, its name must be Cheddar. The
story should make this explicit and use this name for the Corgi main character.

Before

get_character_background_initial.txt

Write a rich background description for a [[input]].

↓

After

get_character_background_optimized.txt

Write a rich background description for a [[input]]. This description will
be used to write a story. Limit your output to 3 paragraphs. Each paragraph should be
detailed but concise, and the overall response should not exceed 4 paragraphs in total.

Planned Features

A glimpse at what we’re building next

Relive & Debug with Workflow Replays
Auto-generate finte tuning datasets
Seamless Integrations Across Frameworks
Intelligent Model Matching at Every Stage
Effortless Golden Eval Set Creation
And even more...

Want something prioritized? Tell us on the waitlist and we’ll reach out.

AI WorkflowsAuto-OptimizedObserve — Evaluate — Improve

From Feedback to Evolution

1. Define your workflow

2. Write your Evaluation Suite

3. Evaluate your workflow

4. View the Results

4. Add Feedback

5. Auto-Optimize

6. Verify Success

Template Evolution

Before

After

Planned Features

AI Workflows
Auto-Optimized
Observe — Evaluate — Improve