Keryx LogoKeryx
Private Beta

Your AI Teammate
for Kubernetes
Operations

Multi-agent orchestration for Kubernetes — deploys specialized AI agents that inspect clusters, learn from past incidents, and resolve issues from Slack or the web dashboard.

See How It Works
#sre-alerts — Slack
P1
AlertCluster10:34 AM

CrashLoopBackOff on checkout-service in namespace prod

Keryx Avatar
KeryxAI Agent

🔍 Investigating checkout-service...

📋 Found: 2/3 pods failing — missing secret stripe-prod-key

💡 Suggested fix: Create secret from AWS Secrets Manager

✓ Approve Fix
View Details
Keryx is monitoring recovery...

Operational Toll

SRE teams are drowning

Manual toil, endless alert noise, and massive cognitive load are burning out your best engineers.

Alert Fatigue

Thousands of duplicate alerts and false positives obscure root causes, burning out responders.

Fragmented Context

Engineers lose critical hours correlating data across disjointed logs, metrics, and terminal screens.

Knowledge Silos

Undocumented fixes and tribal knowledge force teams to repeat the same troubleshooting steps.

Capabilities

AI-native reliability engineering

Everything your SRE team needs — root cause analysis, incident memory, alert intelligence, and an operational dashboard.

keryx dashboard — operational overview
34
Active Alerts
7
Open Investigations
2
Active Incidents
48
Runbooks
System Health
Healthy
Alerts
Investigations
Runbooks
Escalation
On-Call
Severity Levels
On-Call (Primary)
SM
Sarah M.
Agent Architecture

Multi-agent orchestration

Specialized agents communicate via MCP and A2A protocols. Each agent has its own tools, memory, and AI model. Add new agents as your operations grow.

Inspector AgentActiveMCP

Live cluster inspection, pod diagnostics, and infrastructure analysis

Memory AgentActiveA2A

Stores & retrieves past solutions from previous investigations

Coder AgentActiveMCP

Creates PRs, analyzes code, fixes Terraform & K8s manifests

Keryx Orchestrator
Orchestrator
Routes • Classifies • Coordinates
Powered by
AWS Bedrock
LLM models

How agents communicate

1
Message arrives
Via Slack or Web Dashboard
2
Orchestrator classifies
Decides: relevant or noise?
3
Delegates to agents
Inspector + Memory search in parallel
4
Returns diagnosis + fix
Solution stored for next time

Extensible by design. Add new agents with their own tools and models via MCP or A2A protocols.

Workflow

Alert to resolution in minutes

A single Slack conversation takes you from incident detection to verified recovery.

01

Alert Detected

A cluster event, error spike, or deployment failure surfaces. The Orchestrator classifies the message.

02

Agents Collaborate

Inspector Agent examines the cluster. Memory Agent checks past solutions. Coder Agent prepares fixes. Results are correlated.

03

Diagnosis Delivered

A structured root cause analysis with suggested actions is posted to Slack or the web dashboard.

04

Remediate & Learn

Approve the fix. Keryx executes, monitors recovery, and stores the solution for next time.

Use Cases

Real workflows, every week

High-frequency scenarios where Keryx removes friction from Kubernetes incident response.

Noisy Alert Triage

Platform Team
Without Keryx

Dozens of duplicate alerts fire across namespaces, each demanding manual investigation.

With Keryx

Keryx groups correlated alerts into a single Slack thread with root cause and blast radius.

Outcome
70% less alert noise
Integrations & Security

Your stack. Your boundaries.

Connects to the tools you use, with security controls designed for enterprise operations.

Slack
Real-time ChatOps via WebSocket
Kubernetes
RBAC-scoped cluster access
OpenTelemetry
Distributed tracing instrumentation
AWS Bedrock
AI models via OIDC auth
Datadog
Monitors & APM traces
PagerDuty
Incident sync & ack
GitHub
PRs & deployment linking
Slack
Real-time ChatOps via WebSocket
Kubernetes
RBAC-scoped cluster access
OpenTelemetry
Distributed tracing instrumentation
AWS Bedrock
AI models via OIDC auth
Datadog
Monitors & APM traces
PagerDuty
Incident sync & ack
GitHub
PRs & deployment linking
Slack
Real-time ChatOps via WebSocket
Kubernetes
RBAC-scoped cluster access
OpenTelemetry
Distributed tracing instrumentation
AWS Bedrock
AI models via OIDC auth
Datadog
Monitors & APM traces
PagerDuty
Incident sync & ack
GitHub
PRs & deployment linking
Slack
Real-time ChatOps via WebSocket
Kubernetes
RBAC-scoped cluster access
OpenTelemetry
Distributed tracing instrumentation
AWS Bedrock
AI models via OIDC auth
Datadog
Monitors & APM traces
PagerDuty
Incident sync & ack
GitHub
PRs & deployment linking

Security Controls

Least privilege by default

Kubernetes RBAC scoped to incident actions. Tightly defined permission boundaries you control.

Audit-first execution trail

Every decision, command, and remediation event is recorded with full context for compliance.

Human-in-the-loop

High-impact changes require explicit Slack approval before automation proceeds.

Self-hosted deployment

OIDC authentication to AWS — no stored credentials. Data never leaves your infrastructure.

FAQ

Common questions

Everything you need to know about deploying and operating Keryx.

Keryx says hello

Get Started

Ready to upgrade your on-call experience?

Deploy Keryx in your own environment. Your AI teammate is ready to join your Slack in minutes.

✓ Deploy in 30 minutes✓ Local isolated data✓ Existing RBAC✓ Open source