A live model of production. Enforcement for every action.
The Context & Control Model captures how your services, deploys, dependencies, and incidents connect across time. Every action by a human or agent is evaluated against it before it touches production.
NOFire.ai · Investigations
why is checkout failing? payment errors spiking
Ran 19 queries across 4 tools
PR #4471 raised Kafka consumer
batch_size 500 to 2000, pushing memory to 597 MB (95% of the 600 MB limit). Checkout RPC p99 spiked 6x exactly 18 min after deploy.Approve the rollbackAdd memory limit
Ask anything to steer…
Root causeEvidenceCausal chain
Hypotheses
Confirmed · hypothesis #1
PR #4471 (Kafka consumer batch 500 to 2000) breached memory limit, degrading checkout RPC
Deploy raised
batch_size 4x. Kafka process reached 597 MB steady-state 18 min before RPC p99 spiked to 485 ms.92%confidence
Supporting evidence
Kafka memory at 597 MB, 95% of 600 MB limit, sustained 38 min
RPC p99 spiked to 485 ms (6x baseline) starting 04:17:42
4 errors: rpc error: Payment request failed. Invalid token
Contradicting evidence
CPU stable, rules out compute saturation
No pod crashes. OOM threshold not reached
Consistent with memory pressure without full OOM
REJECTED · 6%
Checkout service CPU saturation under traffic spike
6%
REJECTED · 2%
Upstream payment provider latency regression
2%
Recommended actions1 ranked
checkout-svc · config/kafka.yaml-2+2
42
consumer:
43
group_id: checkout-orders
44
- batch_size: 2000
45
- fetch_max_bytes: 8388608
44
+ batch_size: 500
45
+ fetch_max_bytes: 2097152
46
session_timeout_ms: 30000
kustomize/overlays/prod/checkout-svc/kafka-config.yaml
How it works.
Agents that investigate and prevent. Runbooks that fire on schedule, on event, or on demand. One execution path for every actor: human, CI, or agent.
The platform.
Use our agents or bring your own.
01
Models
Works with the frontier models and cloud providers you already run. No new contracts, no new vendor risk. Your cloud, your keys, your compliance posture.
Cloud providers
AWS Bedrock
Azure OpenAI
Google Vertex
Frontier models
02
Causal
Your causal production graph maps every service, dependency, deploy, configuration change, and failure pattern by how they actually cause one another. Time-versioned and continuously updated.
Live · updated 14m ago
checkoutkube_deployment
api-gatewayapp_service
payment-svckube_deployment
postgresrds_instance
otel-collectorapp_service
4 incidents
5 engineer notes
03
Your Stack
Integrations across code, domain knowledge, infrastructure, observability, incident management, and CI/CD, plus custom MCPs and custom tools.
Code
GitHub
GitLab
Bitbucket
Infrastructure
AWS
GCP
Azure
Kubernetes
Telemetry
Datadog
Grafana
Prometheus
Elasticsearch
OpenSearch
Honeycomb
Loki
Tempo
Databases
PostgreSQL
MongoDB
Providers
AWS
GCP
Azure
Collab
Slack
PagerDuty
Atlassian
Linear
Live reads · INV-1380
Queried Prometheus
rate(http_5xx{service="checkout"}[5m])
+540% error rate · 1,243 errors/min
1.2s
Inspected pods
kubectl get pods -l app=checkout -n payments-prod
4/12
CrashLoopBackOff · OOMKilled0.4s
Searched commits
path:services/cart --since=24h → abc1234
Memory limit 512M → 256M in last deploy
0.8s
04
Security & control
Define exactly what agents can do autonomously vs. what needs human approval. SOC 2 Type II, GDPR, and HIPAA aligned.
Read-only · always autonomous
Write · pending approval
Read logs & query metrics
Auto
Revert a commit
Pending approval
Search code & docs
Auto
Restart a deployment
Pending approval
Analyse change events
Auto
Silence an alert
Pending approval
Query traces & spans
Auto
Trigger a workflow
Pending approval
See your production through a causal lens.
A 30-minute call with a founder. We map your stack to the Context & Control Model, live.