AI WorkflowsAuto-OptimizedObserve — Evaluate — Improve
Epigentic transforms your AI workflows with automatic feedback-driven refinement
From Feedback to Evolution
Epigentic doesn't just evaluate, it transforms every run into a better system
1. Define your workflow
Define your workflow using your favorite AI framework. The program flow and configurations such as prompt templates are tracked for the optimization phase
import asyncio
from agents import Agent, Runner
from epigentic import track, TrackingContext, AgentTracker, expand_template
@track()
async def write_story(ctx: TrackingContext, input: str) -> str:
"""
Writes a story with a robust plot and a main character with
a rich background based on the input.
"""
background = await get_character_background(ctx, input)
story_writer = Agent(
name="write_story",
instructions=expand_template(
ctx.get_template_config("template")["value"],
{
"input": input,
"plot": plot,
"background_description": background,
},
),
model=ctx.get_string_config("model"),
hooks=AgentTracker(ctx),
)
result = await Runner.run(story_writer, input)
return result.final_output
@track()
async def get_character_background(ctx: TrackingContext, input: str) -> str:
"""
Creates the background information for the input character.
"""
agent = Agent(
name="get_character_background",
instructions=expand_template(
ctx.get_template_config("template")["value"], {"input": input}
),
model=ctx.get_string_config("model"),
hooks=AgentTracker(ctx),
)
result = await Runner.run(agent, input)
return result.final_output
2. Write your Evaluation Suite
Define your evaluation suite for evaluating and optimizing your workflow evaluating multiple agents acrosss multiple evaluators simultaneously
from typing import Any, Dict, Optional
from epigentic import (Case, Criteria, DataRow, DataSet,
Suite, llm_as_judge_evaluator)
def is_of_correct_length_evaluator(num_paragraphs: int):
return llm_as_judge_evaluator(
id="is_of_correct_length_evaluator",
description="Evaluate whether the output has the correct number of paragraphs",
input_and_output_to_template_values=lambda input, output, params: {
"output": output,
"num_paragraphs": params["num_paragraphs"],
},
params={"num_paragraphs": num_paragraphs},
)
Suite("write_story").add(
Case[str, str, None](
name="case_1",
function=write_story,
dataset=DataSet(
data=[DataRow(input="a corgi dog", specs=None)]
),
criteria=[
Criteria(
target_agent_name_regex=write_story.__name__,
evaluator=is_of_correct_length_evaluator(5),
threshold=1,
),
Criteria(
target_agent_name_regex=get_character_background.__name__,
evaluator=is_of_correct_length_evaluator(3),
threshold=1,
),
],
)
).run()
3. Evaluate your workflow
Evaluate your workflow to see which criteria pass and which fail
$ python example.py evaluate
Hydration and evaluation: [Elapsed time: 0:00:41] ████████████████████████████ ETA: 00:00:00)
...
Running evaluator: is_of_correct_length_evaluator
Target: write_story
Running evaluator: is_of_correct_length_evaluator
Target: get_character_background
...
Evaluation of: write_story
FAILED case_1
Passed: 0, Failed: 1, Skipped: 0
View the evaluation:
https://epigenticai.com/app/evaluations?evaluation=3f90f2cd-b877-4cf1-bd73-33effe2a6696
4. View the Results
See which criteria passed and failed and analyze the workflows complete data flow
4. Add Feedback
Add manual feedback to critique any part of the workflow
5. Auto-Optimize
Run optimization to automatically improve the agent based on feedback
$ python example.py optimize --from 3f90f2cd-b877-4cf1-bd73-33effe2a6696
Optimizing case 1 of 1
Starting function write_story
...
Optimizing sub function: write_story
* Feedback: The main character's name needs to be Cheddar if it is a Corgi dog.
* Failed score reasons: The story contains 22 paragraphs. For a score of 1, the story needed to be 2 or 3 paragraphs long. Since 22 is 4 or more paragraphs, the score is 0.
Getting failed scores
Getting feedback items
...
Thinking: Here are the problems to address: (1) The story output is much too long (22 paragraphs, rather than 2-3)
and (2) the main character must be named Cheddar if it's a Corgi dog. These issues should be resolved by updating
the prompt template for the write_story function. I'll edit the write_story template so that it explicitly requests
a story of only 2-3 paragraphs and ensures the main character is named Cheddar if the story is about a Corgi. Then,
I'll re-evaluate to see if both issues are fixed.
...
Updating configuration for function write_story
...
Re-evaluating the function get_background_description
...
For get_character_background, updating the template configuration
For get_story, updating the template configuration
Optimization complete
View the optimizer tracks at https://agenticai.com/app/tracks?id=f652f995-635b-42f1-b68e-5bd0ad738ec4
6. Verify Success
Re-run evaluation to confirm the agent now passes all criteria
$ python example.py evaluate
Hydration and evaluation: [Elapsed time: 0:00:32] ████████████████████████████ ETA: 00:00:00)
...
Running evaluator: is_of_correct_length_evaluator
Target: write_story
Running evaluator: is_of_correct_length_evaluator
Target: get_character_background
...
Evaluation of: write_story
PASSED case_1
Passed: 1, Failed: 0, Skipped: 0
View the evaluation:
https://epigenticai.com/app/evaluations?evaluation=2e70f3ef-c562-1af2-ca72-a1bcde57bae2
Template Evolution
Templates auto-optimized to meet requirements and adapt to feedback
Before
Write a story about a [[input]].
After
Write a story about a [[input]].
Use the following plot:
<plot>
[[plot]]
</plot>
And background description of the main character:
<background_description>
[[background_description]]
</background_description>
IMPORTANT: The story must be exactly 2 or 3 paragraphs long—do NOT write more than 3
paragraphs total. If the main character is a Corgi dog, its name must be Cheddar. The
story should make this explicit and use this name for the Corgi main character.
Planned Features
A glimpse at what we’re building next
- Relive & Debug with Workflow Replays
- Auto-generate finte tuning datasets
- Seamless Integrations Across Frameworks
- Intelligent Model Matching at Every Stage
- Effortless Golden Eval Set Creation
- And even more...