The Story of Olly
Autonomous Observability Agent

Powering Quality KPIs with AI

The observability paradox

0 dashboards
0 alerts/week
0 TB logs/day

More data. More dashboards to learn. More queries to master.
Same question: "What's broken and why?"

We looked around...

πŸ’¬ AI query builders
πŸ“Š Prompt-to-dashboard
πŸ”” Smart alert tuning

Faster dashboards. Easier queries.
But still no answer to: "Why is it broken?"

We asked ourselves:

What can LLMs actually change
for observability?

The metrics that matter:

MTTR MTTD Change Failure Rate

Not faster dashboards β€” faster root cause

The change we need to see

Reliability reimagined with LLMs

🧠

Knowledge relevant to
your environment

No more generic answers. Context that understands your stack, your history, your patterns.

πŸ’¬

Just use
natural language

Ask questions like you talk. No query languages. No dashboard hunting. Just answers.

∞

Break the
human scale problem

When systems grow beyond human comprehension, LLMs bridge the gap between complexity and clarity.

Three hard problems

01

Context

What services exist? What broke before?

embeddings knowledge graphs incident memory
02

Scale

Petabytes of logs β†’ limited token window

tool use RAG summarization
03

Correctness

Did it actually find the root cause?

evals task scoring rubrics

Olly Architecture

Observability Data

πŸ“Š APM
πŸ–₯️ Infrastructure
πŸ”” Alerting
πŸ“ˆ Metrics
πŸ” Traces & Logs
β†’

Knowledge

Context of your environment

β†’

Specialized Agents

πŸ”¬ Logs Expert
πŸ•ΈοΈ Trace Explorer
πŸ“Š Metrics Analyzer
πŸ” Security Researcher
πŸ”— Correlation Agent
πŸ’‘ Hypothesis Generator
...and more
β†’
✨

Invaluable
Production
Insights

Measuring correctness: Evals

"What caused the checkout spike last night?"
+100Retrieved relevant logs
+200Identified correct service
+500Correlated to deployment
+1000Correct root cause
βˆ’50Irrelevant metrics
βˆ’200Wrong service blamed

Not "did it sound smart?" β€” "did it find the bug?"

Demo

"We had checkout errors spike last night.
What changed and what's the likely cause?"

What we're seeing

60 min β†’ 5 min
Daily health checks

"Not just faster β€” smarter. Complex investigations become clear answers."

MTTR ↓ MTTD ↓ Alert fatigue ↓ Fewer handoffs

Where this goes

Today

You ask, Olly investigates

Next

Olly triages before you wake

Vision

Prevent before customers notice

Reactive firefighting β†’ Proactive reliability

Olly

Questions?