Foxhound is an observability platform purpose-built for AI agent architectures. It provides full decision traceability for autonomous AI systems — traces, session replay, run diffing, and structured auditability for production workflows.

What is AI agent observability?

AI agent observability is the ability to understand what autonomous AI agents are doing by tracing every decision, tool call, LLM invocation, and branching point. Unlike generic APM tools that treat agent workflows as black boxes, AI agent observability platforms like Foxhound trace the full decision tree so you can see exactly what happened, why, and where something went wrong.

What AI agent frameworks does Foxhound support?

Foxhound auto-instruments LangGraph, CrewAI, OpenAI Agents SDK, and Claude Agent SDK. It is also compatible with OpenTelemetry / OTLP-compatible agent frameworks. SDKs are available for TypeScript and Python.

Is Foxhound publicly viewable and self-hostable?

Yes. Foxhound is publicly viewable and designed for self-hosted deployment. It ships as a Docker Compose stack that you can run on infrastructure you control, with licensing handled directly.

Can I self-host Foxhound?

Yes. Foxhound is designed to be self-hosted. It deploys via Docker Compose, Kubernetes, or bare metal. Your traces and data stay on infrastructure you control, with support for tenant-aware workflows and structured auditability.

LLM observability built for agents

Try the sandbox View on GitHub

Turn agent failures into safer releases.

Trace production runs. Replay what the agent saw. Diff good runs against bad. Derive evaluation datasets from real failures, validate fixes, and govern cost and SLA risk before rollout.

Session Replay reconstructs what the agent knew at each decision point
Run Diff pinpoints where a bad run diverged from a known good one
Cost Budgets kill runaway loops at the SDK level before the bill arrives
SLA Monitoring alerts on latency drift and success-rate drops in real time

$pip install foxhound-ai

Explore capabilitiesNo install requiredAudit-friendly tracesBuilt for production incidents

Trace Explorer — run_7f3a2c1b

SpanTimeline (2.14s)Duration

WORKFLOWsupport-agent-workflow

2.14s

TOOLretrieve_knowledge_base

420ms

LLMllm:gpt-4

1.24s

TOOLsend_email

180ms

TOOLupdate_ticket

CUSTOMpii_filter

Auto-instrumentsLangGraphCrewAIOpenTelemetryOpenAI AgentsClaude SDKPydantic AIMastraBedrockGoogle ADK

Investigate

Replay agent state at any decision point. See exactly what data was available when things went wrong.

Compare

Diff two runs side-by-side. Find every divergence in prompts, tool calls, and model outputs.

Evaluate

Turn bad production runs into reusable evaluation datasets. Score with LLM-as-a-Judge.

Govern

Enforce per-agent cost budgets and SLA thresholds at the SDK level. Alert before incidents spread.

Capabilities

From broken run to safer release.

Foxhound starts with traces, but the real value is what comes next: investigate regressions, validate fixes against real failures, and govern cost and reliability before rollout.

Investigate

Understand what happened, where behavior changed, and why a run failed

Trace Explorer

Complete span tree of every agent run. Every tool call, LLM invocation, and branch.

Session Replay

Reconstruct agent state at any point. See exactly what data was available when a decision was made.

Run Diff

Compare two runs side-by-side. Spot every divergence. Find where behavior changed.

Improve

Turn bad runs into evaluation inputs and validate fixes before promotion

LLM-as-a-Judge

Automated evaluation with GPT-4 and Claude. Score every trace.

Datasets from Traces

Turn production failures into reusable evaluation datasets. Filter by score, time range, or agent.

Experiments

Run datasets through agent versions. Compare scores. Catch regressions before deploy.

GitHub Actions

Block PRs that degrade quality. Scores in every PR comment.

Govern

Control cost, latency, and behavior drift before small issues become incidents

Cost Budgets

Per-agent spend limits. SDK callback kills runaway loops before the bill arrives.

SLA Monitoring

P95 latency and success rate thresholds. Auto-alert on breach.

Regression Detection

Track behavior drift across versions and catch regressions before they spread.

Slack Alerts

Route alerts by type and severity. Cost spikes, SLA breaches, regressions.

Comparison

Built for agents, not chatbots

Most LLM observability tools stop at traces and prompt logs. Foxhound adds the debugging, evaluation, and governance workflows teams need when agents run in production.

Feature	Foxhound	Langfuse	LangSmith	Braintrust
Session Replay
Run Diff
Cost Budgets (SDK enforcement)
SLA Monitoring
Behavior Regression Detection
LLM-as-a-Judge Evaluation
Dataset Auto-Curation
GitHub Actions Integration
MCP Server (31 tools)
Self-Hosted
Infrastructure Control

Comparison based on publicly available documentation and product testing as of April 2026. If something has changed, open an issue.

Developer tools

Built for your workflow

IDE-native debugging and evaluation workflows, plus CI quality gates

MCP Server

31 tools for debugging, trace inspection, scoring, and evaluation workflows via Model Context Protocol. Set budgets, monitor SLAs, query traces, and trigger evaluations — all from your agent runtime.

Cost budget enforcement
SLA threshold monitoring
Trace search & retrieval
LLM-as-a-Judge scoring
Dataset auto-curation
Real-time alerting

foxhound.mcp.ts

// MCP Server - 31 tools for debugging and evaluation workflows
import { FoxhoundMCP } from 'foxhound-ai/mcp';
 
const mcp = new FoxhoundMCP({
  apiKey: process.env.FOXHOUND_API_KEY,
  org: 'acme-corp',
});
 
// Budget enforcement tool
await mcp.setCostBudget({
  agentId: 'customer-support',
  budget: 50,
  period: 'daily',
});
 
// SLA monitoring
await mcp.setSLAThreshold({
  agentId: 'customer-support',
  p95Latency: 5000,
  successRate: 0.95,
});

Security & data sovereignty

Your data, your infrastructure.

Developer-first observability for teams that need control over where traces, prompts, and evaluation workflows live

Tenant-scoped by design

Foxhound is built around org-scoped data access so teams can keep tenant boundaries explicit throughout the platform.

Org-scoped API keys

API keys are scoped per organization to support safer multi-tenant workflows and cleaner operational boundaries.

Self-hosted

Run Foxhound on infrastructure you control and keep trace, replay, and evaluation workflows inside your own environment.

Bring your own model keys

Use your own provider credentials for evaluation workflows instead of routing that data through a managed shared layer.

Structured auditability

Capture who accessed what and when so production debugging and review workflows stay easier to inspect.

Built for security-sensitive deployments

Foxhound is designed for teams that care about tenant boundaries, infrastructure control, and operational visibility without forcing a hosted-only model.

Deploy in your VPC

Self-host Foxhound where it fits your stack — from simple Docker deployments to more controlled infrastructure footprints.

View Deployment Guide

SDK integration

One decorator, full observability

Auto-instruments LangGraph, CrewAI, OpenAI, Claude, and OpenTelemetry-compatible systems

agent.py

from foxhound import trace
from langgraph import StateGraph, Workflow
 
# Automatic LangGraph instrumentation
@trace(org="acme-corp")
def customer_support_agent(input: str):
    workflow = StateGraph()
    
    # Every tool call, LLM invocation automatically traced
    workflow.add_node("router", route_query)
    workflow.add_node("support", support_response)
    workflow.add_node("escalation", escalate_to_human)
    
    return workflow.compile().invoke({"query": input})
 
# Cost budget enforced at SDK level
customer_support_agent.set_budget(50, period="daily")

Auto-instrumentation

Zero manual spans

Cost budgets

SDK-enforced limits

Multi-framework

10+ frameworks

Type-safe

Full TypeScript/Python types

Pricing

Self-host free. Managed cloud coming.

Deploy on your infrastructure with a direct license. No hosted lock-in, no usage caps.

Self-Hosted

$0/ forever

Deploy with a direct license. No hosted lock-in.

Unlimited traces & sessions
All observability features
Cost budgets & SLA monitoring
LLM-as-a-Judge evaluation
GitHub Actions integration
MCP Server (31 tools)
Run on your infrastructure
Direct-license deployment

Get Started Free

Managed Cloud

Coming Soon

Fully managed hosting is planned. Details for hosted security and compliance posture will be announced with the service.

All self-hosted features
Managed infrastructure
Automatic updates
Dedicated support
Data residency options
SSO & advanced auth (planned)
Uptime SLA (details at launch)
Hosted compliance posture (details at launch)

Join Waitlist

Managed cloud pricing will be announced when the hosted service launches. Self-hosted will always be free.

Use cases

Start with the problem you have today.

Start with the part of the platform you care about most — framework support, debugging workflows, replay, or run comparison.

LangGraph observability

Trace node execution, tool calls, model invocations, and graph branching with replay and run diff built for LangGraph production systems.

Learn more

CrewAI observability

Inspect multi-agent handoffs, crew execution state, failures, and performance regressions across CrewAI workloads.

Learn more

OpenAI Agents observability

Monitor agent runs, budget drift, latency spikes, and behavior changes for OpenAI Agents SDK deployments.

Learn more

Claude agent observability

Debug Claude-powered agents with session replay, structured traces, evaluation pipelines, and auditability for production workflows.

Learn more

AI agent session replay

Reconstruct what an agent knew at each decision point so you can explain failures instead of guessing at them.

Learn more

AI agent run diff

Compare two agent runs side by side to see exactly where prompts, tools, model outputs, or control flow diverged.

Learn more

Source available

Public repo, direct licensing

Foxhound is published in a public repository for reference and evaluation. You can inspect the code and run licensed deployments on infrastructure your team controls.

MCP Tools

10+

Frameworks

Self-Hosted

OTLP

Compatible

Star on GitHub Read the Docs

FOXHOUND