LLM observability built for agents
Try the sandboxView on GitHub

Turn agent failures into safer releases.

Trace production runs. Replay what the agent saw. Diff good runs against bad. Derive evaluation datasets from real failures, validate fixes, and govern cost and SLA risk before rollout.

  • Session Replay reconstructs what the agent knew at each decision point
  • Run Diff pinpoints where a bad run diverged from a known good one
  • Cost Budgets kill runaway loops at the SDK level before the bill arrives
  • SLA Monitoring alerts on latency drift and success-rate drops in real time
$pip install foxhound-ai
Explore capabilitiesNo install requiredAudit-friendly tracesBuilt for production incidents
Trace Explorer — run_7f3a2c1b
SpanTimeline (2.14s)Duration
WORKFLOWsupport-agent-workflow
2.14s
TOOLretrieve_knowledge_base
420ms
LLMllm:gpt-4
1.24s
TOOLsend_email
180ms
TOOLupdate_ticket
95ms
CUSTOMpii_filter
28ms
Auto-instrumentsLangGraphCrewAIOpenTelemetryOpenAI AgentsClaude SDKPydantic AIMastraBedrockGoogle ADK
Investigate

Replay agent state at any decision point. See exactly what data was available when things went wrong.

Compare

Diff two runs side-by-side. Find every divergence in prompts, tool calls, and model outputs.

Evaluate

Turn bad production runs into reusable evaluation datasets. Score with LLM-as-a-Judge.

Govern

Enforce per-agent cost budgets and SLA thresholds at the SDK level. Alert before incidents spread.

Capabilities

From broken run to safer release.

Foxhound starts with traces, but the real value is what comes next: investigate regressions, validate fixes against real failures, and govern cost and reliability before rollout.

Investigate

Understand what happened, where behavior changed, and why a run failed

Trace Explorer

Complete span tree of every agent run. Every tool call, LLM invocation, and branch.

Session Replay

Reconstruct agent state at any point. See exactly what data was available when a decision was made.

Run Diff

Compare two runs side-by-side. Spot every divergence. Find where behavior changed.

Improve

Turn bad runs into evaluation inputs and validate fixes before promotion

LLM-as-a-Judge

Automated evaluation with GPT-4 and Claude. Score every trace.

Datasets from Traces

Turn production failures into reusable evaluation datasets. Filter by score, time range, or agent.

Experiments

Run datasets through agent versions. Compare scores. Catch regressions before deploy.

GitHub Actions

Block PRs that degrade quality. Scores in every PR comment.

Govern

Control cost, latency, and behavior drift before small issues become incidents

Cost Budgets

Per-agent spend limits. SDK callback kills runaway loops before the bill arrives.

SLA Monitoring

P95 latency and success rate thresholds. Auto-alert on breach.

Regression Detection

Track behavior drift across versions and catch regressions before they spread.

Slack Alerts

Route alerts by type and severity. Cost spikes, SLA breaches, regressions.

Comparison

Built for agents, not chatbots

Most LLM observability tools stop at traces and prompt logs. Foxhound adds the debugging, evaluation, and governance workflows teams need when agents run in production.

FeatureFoxhoundLangfuseLangSmithBraintrust
Session Replay
Run Diff
Cost Budgets (SDK enforcement)
SLA Monitoring
Behavior Regression Detection
LLM-as-a-Judge Evaluation
Dataset Auto-Curation
GitHub Actions Integration
MCP Server (31 tools)
Self-Hosted
Infrastructure Control

Comparison based on publicly available documentation and product testing as of April 2026. If something has changed, open an issue.

Developer tools

Built for your workflow

IDE-native debugging and evaluation workflows, plus CI quality gates

MCP Server

31 tools for debugging, trace inspection, scoring, and evaluation workflows via Model Context Protocol. Set budgets, monitor SLAs, query traces, and trigger evaluations — all from your agent runtime.

  • Cost budget enforcement
  • SLA threshold monitoring
  • Trace search & retrieval
  • LLM-as-a-Judge scoring
  • Dataset auto-curation
  • Real-time alerting
foxhound.mcp.ts
// MCP Server - 31 tools for debugging and evaluation workflows
import { FoxhoundMCP } from 'foxhound-ai/mcp';
 
const mcp = new FoxhoundMCP({
apiKey: process.env.FOXHOUND_API_KEY,
org: 'acme-corp',
});
 
// Budget enforcement tool
await mcp.setCostBudget({
agentId: 'customer-support',
budget: 50,
period: 'daily',
});
 
// SLA monitoring
await mcp.setSLAThreshold({
agentId: 'customer-support',
p95Latency: 5000,
successRate: 0.95,
});
Security & data sovereignty

Your data, your infrastructure.

Developer-first observability for teams that need control over where traces, prompts, and evaluation workflows live

Tenant-scoped by design

Foxhound is built around org-scoped data access so teams can keep tenant boundaries explicit throughout the platform.

Org-scoped API keys

API keys are scoped per organization to support safer multi-tenant workflows and cleaner operational boundaries.

Self-hosted

Run Foxhound on infrastructure you control and keep trace, replay, and evaluation workflows inside your own environment.

Bring your own model keys

Use your own provider credentials for evaluation workflows instead of routing that data through a managed shared layer.

Structured auditability

Capture who accessed what and when so production debugging and review workflows stay easier to inspect.

Built for security-sensitive deployments

Foxhound is designed for teams that care about tenant boundaries, infrastructure control, and operational visibility without forcing a hosted-only model.

Deploy in your VPC

Self-host Foxhound where it fits your stack — from simple Docker deployments to more controlled infrastructure footprints.

View Deployment Guide
SDK integration

One decorator, full observability

Auto-instruments LangGraph, CrewAI, OpenAI, Claude, and OpenTelemetry-compatible systems

agent.py
from foxhound import trace
from langgraph import StateGraph, Workflow
 
# Automatic LangGraph instrumentation
@trace(org="acme-corp")
def customer_support_agent(input: str):
workflow = StateGraph()
# Every tool call, LLM invocation automatically traced
workflow.add_node("router", route_query)
workflow.add_node("support", support_response)
workflow.add_node("escalation", escalate_to_human)
return workflow.compile().invoke({"query": input})
 
# Cost budget enforced at SDK level
customer_support_agent.set_budget(50, period="daily")
Auto-instrumentation
Zero manual spans
Cost budgets
SDK-enforced limits
Multi-framework
10+ frameworks
Type-safe
Full TypeScript/Python types
Pricing

Self-host free. Managed cloud coming.

Deploy on your infrastructure with a direct license. No hosted lock-in, no usage caps.

Most Popular

Self-Hosted

$0/ forever

Deploy with a direct license. No hosted lock-in.

  • Unlimited traces & sessions
  • All observability features
  • Cost budgets & SLA monitoring
  • LLM-as-a-Judge evaluation
  • GitHub Actions integration
  • MCP Server (31 tools)
  • Run on your infrastructure
  • Direct-license deployment
Get Started Free

Managed Cloud

Coming Soon

Fully managed hosting is planned. Details for hosted security and compliance posture will be announced with the service.

  • All self-hosted features
  • Managed infrastructure
  • Automatic updates
  • Dedicated support
  • Data residency options
  • SSO & advanced auth (planned)
  • Uptime SLA (details at launch)
  • Hosted compliance posture (details at launch)
Join Waitlist

Managed cloud pricing will be announced when the hosted service launches. Self-hosted will always be free.

Source available

Public repo, direct licensing

Foxhound is published in a public repository for reference and evaluation. You can inspect the code and run licensed deployments on infrastructure your team controls.

31
MCP Tools
10+
Frameworks
$0
Self-Hosted
OTLP
Compatible
Star on GitHubRead the Docs
FOXHOUND