Full Documentation

This document is the complete high-level reference for Aegis Monitor.

1. What Aegis Monitor Solves

Aegis Monitor helps teams evaluate model output quality, monitor costs, compare models, and detect regressions in production-oriented workflows.

2. System Modules

aegis/core

aegis/adapters

aegis/scoring

aegis/cost

aegis/storage

aegis/cli

3. End-to-End Evaluation Flow

1. Load dataset from YAML.

2. Build adapter + scorer (+ optional storage/cost calculator).

3. Execute all test cases with the evaluator.

4. Compute average quality, cost, latency, pass rate.

5. Save run for trend and baseline comparisons.

4. CLI Reference

Evaluate

aegis eval run \

--dataset \

--model gpt-4 \

--provider auto \

--output text \

--storage aegis.db

Compare

aegis compare \

--dataset \

--models gpt-4,gpt-3.5-turbo,claude-3-opus \

--output text

Baseline

aegis baseline set --dataset  --run-id 

aegis baseline show --dataset

aegis baseline list

Cost

aegis cost report --period week

aegis cost analyze --period month

aegis cost budget --limit 100 --mode warn --dataset

5. Python API Reference

Minimal API usage

from aegis.core.dataset import Dataset

from aegis.adapters.mock_adapter import MockAdapter

from aegis.scoring.exact_match import ExactMatchScorer

from aegis.core.evaluator import Evaluator

dataset = Dataset.from_yaml("examples/datasets/qa_sample.yaml")

adapter = MockAdapter("mock-model")

scorer = ExactMatchScorer()

evaluator = Evaluator(adapter, scorer)

result = evaluator.run_sync(dataset)

print(result.avg_score, result.total_cost)

Advanced API usage with storage and cost

from aegis.storage.sqlite_backend import SQLiteBackend

from aegis.cost.calculator import CostCalculator

storage = SQLiteBackend("aegis.db")

storage.initialize()

cost_calculator = CostCalculator()

evaluator = Evaluator(adapter, scorer, storage=storage, cost_calculator=cost_calculator)

result = evaluator.run_sync(dataset)

6. Configuration

Environment variables

Dependency groups

7. Output and Metrics

Typical metrics in run results:

Comparison adds:

8. Testing and Validation

Run all tests:

pytest -v

Run validation utility:

python validate_project.py

9. Extension Guide Summary

10. Security and Operational Notes

11. Documentation Map