Usage Guide
This guide covers day-to-day use of Aegis Monitor from install to evaluation workflows.
1) Installation
Base install
pip install aegis-ai
Development install (from source)
git clone https://github.com/aegis-ai/aegis-ai
cd aegis-ai
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Optional features
OpenAI adapter
pip install -e ".[openai]"
Anthropic adapter
pip install -e ".[anthropic]"
Semantic scoring
pip install -e ".[scoring]"
Everything
pip install -e ".[all]"
2) Environment Setup
Create .env from the template:
cp .env.example .env
Set keys as needed:
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
3) Dataset Format
Aegis uses YAML datasets.
name: qa_sample
description: basic QA checks
cases:
- input: "What is the capital of France?"
expected: "Paris"
tags: ["geography"]
- input: "What is 2 + 2?"
expected: "4"
tags: ["math"]
Required fields:
namecases(array)- each case needs
inputandexpected
4) CLI Usage
Evaluate one model
aegis eval run --dataset examples/datasets/qa_sample.yaml --model gpt-4
Options:
--provider(auto,openai,mock)--output(text,json)--storageSQLite file path--baselinebaseline dataset key for regression comparison
Compare multiple models
aegis compare \
--dataset examples/datasets/qa_sample.yaml \
--models gpt-4,gpt-3.5-turbo,claude-3-opus
Manage baselines
aegis baseline set --dataset qa_sample --run-id
aegis baseline show --dataset qa_sample
aegis baseline list
Cost analysis
aegis cost report --period week
aegis cost analyze --period month
aegis cost budget --limit 100 --mode warn --dataset qa_sample
5) Python API Usage
from aegis.core.dataset import Dataset
from aegis.adapters.mock_adapter import MockAdapter
from aegis.scoring.exact_match import ExactMatchScorer
from aegis.core.evaluator import Evaluator
Load dataset
dataset = Dataset.from_yaml("examples/datasets/qa_sample.yaml")
Build evaluator
adapter = MockAdapter("mock-model")
scorer = ExactMatchScorer()
evaluator = Evaluator(adapter, scorer)
Run
result = evaluator.run_sync(dataset)
print(result.avg_score, result.total_cost)
6) Common Workflows
Regression gate in CI
1. Run evaluation and save baseline from approved run.
2. Compare new runs to baseline.
3. Fail pipeline on regression threshold.
Cost-aware model selection
1. Run aegis compare on candidate models.
2. Use CPQ (cost-per-quality) ranking from output.
3. Pick model with best quality-cost balance.
7) Troubleshooting
Unknown model error
- Ensure model name matches supported adapter prefixes.
- Example:
gpt-(OpenAI),claude-(Anthropic),mock-*via MockAdapter usage.
Missing dependency
- Install optional dependency group (for example
.[anthropic]or.[scoring]).
Empty or invalid dataset
- Validate YAML structure (
name,cases, each case withinputandexpected).