Usage Guide

This guide covers day-to-day use of Aegis Monitor from install to evaluation workflows.

1) Installation

Base install

pip install aegis-ai

Development install (from source)

git clone https://github.com/aegis-ai/aegis-ai

cd aegis-ai

python -m venv .venv

source .venv/bin/activate

pip install -e ".[dev]"

Optional features

OpenAI adapter

pip install -e ".[openai]"

Anthropic adapter

pip install -e ".[anthropic]"

Semantic scoring

pip install -e ".[scoring]"

Everything

pip install -e ".[all]"

2) Environment Setup

Create .env from the template:

cp .env.example .env

Set keys as needed:

OPENAI_API_KEY=your-openai-key

ANTHROPIC_API_KEY=your-anthropic-key

3) Dataset Format

Aegis uses YAML datasets.

name: qa_sample

description: basic QA checks

cases:

  • input: "What is the capital of France?"

expected: "Paris"

tags: ["geography"]

  • input: "What is 2 + 2?"

expected: "4"

tags: ["math"]

Required fields:

4) CLI Usage

Evaluate one model

aegis eval run --dataset examples/datasets/qa_sample.yaml --model gpt-4

Options:

Compare multiple models

aegis compare \

--dataset examples/datasets/qa_sample.yaml \

--models gpt-4,gpt-3.5-turbo,claude-3-opus

Manage baselines

aegis baseline set --dataset qa_sample --run-id 

aegis baseline show --dataset qa_sample

aegis baseline list

Cost analysis

aegis cost report --period week

aegis cost analyze --period month

aegis cost budget --limit 100 --mode warn --dataset qa_sample

5) Python API Usage

from aegis.core.dataset import Dataset

from aegis.adapters.mock_adapter import MockAdapter

from aegis.scoring.exact_match import ExactMatchScorer

from aegis.core.evaluator import Evaluator

Load dataset

dataset = Dataset.from_yaml("examples/datasets/qa_sample.yaml")

Build evaluator

adapter = MockAdapter("mock-model")

scorer = ExactMatchScorer()

evaluator = Evaluator(adapter, scorer)

Run

result = evaluator.run_sync(dataset)

print(result.avg_score, result.total_cost)

6) Common Workflows

Regression gate in CI

1. Run evaluation and save baseline from approved run.

2. Compare new runs to baseline.

3. Fail pipeline on regression threshold.

Cost-aware model selection

1. Run aegis compare on candidate models.

2. Use CPQ (cost-per-quality) ranking from output.

3. Pick model with best quality-cost balance.

7) Troubleshooting

Unknown model error

Missing dependency

Empty or invalid dataset

8) Next Reading