Your service catalog is
already wrong.

Your AI agents are acting on it right now. NOFire AI derives your catalog from live production, so it reflects what is really running, not what someone declared six months ago.

Book a demo

NOFire.ai · Service Wiki

Service wiki

Generated from your environment · observed continuously

1 incident3 warning

Search by service name

CriticalityHealthOwner

⊞ List∿ Graph

frontend-proxywarning

TIER 2productionno owner1 gap

calls 6by 19deploy 1h ago

readiness88%

checkoutwarning

TIER 2productionbackend2 gaps

calls 9by 19deploy 6h ago

readiness44%

fraud-detectionincident

TIER 2productionno owner3 gaps

calls 3by 11deploy 5d ago

readiness31%

product-cataloghealthy

TIER 3productionbackend

calls 3by 14deploy 1d ago

readiness72%

paymenthealthy

TIER 3productionbackend

calls 1by 7deploy 3h ago

readiness90%

cartwarning

TIER 3qabackend2 gaps

calls 2by 8deploy 12h ago

readiness27%

18 services·sorted by criticality·last scan: 4s ago

Trusted by

Service catalog automation: why manual maintenance fails

Your catalog is always a step behind.

Backstage, Cortex, Compass, Roadie. Every catalog tool shares the same assumption: humans will keep it accurate. That assumption breaks by week two.

Stale by design

Catalog entries go out of date the moment a team restructures, a dependency shifts, or a new service ships. Nobody updates the YAML.

Incidents expose it

Your on-call rotation discovers the real service topology during the incident. The catalog was wrong. It always is.

Maintenance tax

Two to three engineers spend significant time keeping Backstage running: plugins, integrations, authentication, the catalog entries themselves.

Comparison

Why teams replace Backstage with NOFire AI.

Every tool in this table is a good product. The difference is the architecture: NOFire AI catalogs from observation. Every other tool catalogs from declaration.

Capability	NOFire AI	Backstage	Cortex	Compass / Roadie
YAML required	None	Yes, for every entity	Yes, service definitions	Yes, via Backstage entities
Dependency provenance	Every dependency labeled: runtime (observed) or inferred. Confidence score included.	Declared in YAML, no confidence model	Declared or integration-based, no provenance	Declared, no provenance
Service ownership	Observed from deploys, on-call, and contributor history	Manual YAML declaration	Manual + team policies	Jira-linked teams, manual
Repository analysis	Agents read README, CHANGELOG, CI/CD pipelines, contributor history	TechDocs, manual authoring	GitHub integration, partial	TechDocs via plugins, manual
Readiness scorecard	Scored from actual SLOs, alerts, and incident data	Plugin-based, manual data input	Policy rules, manual input	Jira-based metrics, manual
Change timeline	Auto-populated from CI/CD and incident events	Not built in	Via integrations, partial	Via Jira or plugins, partial
Dependency graph	Observed from live telemetry	YAML declaration	Partial via integrations	Partial via plugins
Blast radius	Calculated from observed call graph	Not available	Not available	Not available
Maintenance required	Near zero: agents observe continuously	2 to 3 engineers, ongoing	Moderate, ongoing	Low (hosted), catalog still goes stale

Book a demo

Agent-powered

One panel. Every layer of service knowledge.

The service detail page in NOFire AI is populated entirely from what agents observe. Nothing is declared manually. Nothing goes stale.

Unlike Backstage, Cortex, and Compass, NOFire AI requires zero YAML, zero manual catalog entries, and zero plugin maintenance.

NOFire.ai · checkout · service detail

checkout

v24 · production

No SLO: No SLO / recording rule defined

Important · 1 gap

Overview

The checkout service orchestrates the end-to-end purchase flow, coordinating payment processing, inventory validation, and shipping arrangements. It acts as the central transaction coordinator, calling payment, product-catalog, cart, item validation, shipping, currency, email, kafka, and flagd.

Change timeline

deploya3f91cfeat: add circuit breaker for payment retriesm.chen
2h ago

hotfix7d204bfix: OOM crash at peak load (memory cap 20MB)p.moustafellos
4d ago

configc9e812chore: tune GC percent to reduce goroutine bloata.sapranidis
9d ago

deployf10a3drefactor: optimize product-catalog query batchingm.chen
19d ago

Observability

Scope: {service_name="checkout", k8s_namespace_name="otel-demo"}

▼ Latency

rpc_server_duration_milliseconds_bucket (histogram): server-side RPC request duration

histogram_quantile(0.99, rate(rpc_server_duration_milliseconds_bucket{service_name="checkout",k8s_namespace_name="otel-demo"}[5m]))

rpc_client_duration_milliseconds_count (counter): client-side RPC call count

rate(rpc_client_duration_milliseconds_count{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

▼ Throughput

traces_span_metrics_calls_total (counter): total span calls

rate(traces_span_metrics_calls_total{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

rpc_server_responses_per_rpc_count (counter): RPC responses per call

rate(rpc_server_responses_per_rpc_count{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

▼ Custom

go_goroutine_count (counter): number of active goroutines

rate(go_goroutine_count{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

go_config_gogc_percent (gauge): Go GC target percentage

go_config_gogc_percent{service_name="checkout",k8s_namespace_name="otel-demo"}

▼ Alerts

Service high error rate: warning, for 90s

Service high latency: warning, for 1m

Service traffic spike: warning, for 30s

When this breaks

All purchase flows halt. Checkout is the sole transaction coordinator.[INV-27]

frontend-proxy p99 latency spikes as retries queue; circuit breaker trips within 90s.[INV-26]

19 downstream services lose checkout context: fraud-detection, payment, shipping go idle.[INV-26]

Runbooks & Learnings

📄Checkout investigation: diagnose latency spikes, payment retries, and OOM events via p99 trend + goroutine countRunbook↗

📄Checkout service lacks Prometheus metrics instrumentation or scraping configuration, preventing o...Learning↗

📄Memory limit of 20MB insufficient for checkout service workload requiring 18-19MB, causing OOMKil...Learning↗

Ontology

Ownerbackend● observed

Lifecycleproduction

Criticality

ImportantTIER 2● inferred44%

Readiness

Ready · 100%

owner ✓metrics ✓alerts ✓resilient ✓

Health

No signal yet● unknown

Live health (SLO / error rate / saturation) arrives with the state engine.

Depends On

shipping

● observed

312/min

p99 18ms

● observed

198/min

p99 42ms

cart

● observed

1,840/min

p99 9ms

product-catalog

● observed

2,103/min

p99 11ms

otel-collector

● observed

async

p99 n/a

currency

● observed

876/min

p99 7ms

payment

● observed

420/min

p99 134ms

kafka

● observed

async

p99 n/a

flagd

● observed

654/min

p99 3ms

Structure

owned_bydeployment:checkout

100%

● observed

Blast Radius

accountingadcartcurrencyemailflagdfraud-detectionfrontendfrontend-proxyimage-providerkafkaload-generatorotel-collectorpaymentproduct-catalogproduct-reviewsquoterecommendationshipping

observed

Past Incidents

INV-27P1Checkout failing under payment load spike↗

resolved in 23 min

INV-26P1Checkout unresponsive after OOM kill↗

resolved in 41 min

INV-22P2ProductCatalogService intermittent UNAVAILABLE↗

resolved in 1h 12m

INV-20P2Checkout missing Prometheus scrape target↗

resolved in 55 min

INV-18P2Checkout latency p99 spike on EU traffic↗

resolved in 38 min

INV-12P3Checkout lacks alerting rule on error rate↗

resolved in 2h 4m

INV-10P3No SLO defined for checkout success rate↗

● open

INV-9P3Ownership unset: no team assigned to checkout↗

● open

Source

Production signals + repos

Manual input

None

Update frequency

Continuous

Maintenance required

Near zero

Deterministic facts. LLM-narrated prose.

The catalog structure, dependencies, readiness, and blast radius come from your system, not from an LLM. The LLM only narrates what it cannot invent: prose about what the facts mean.

Every claim cited.

Known mitigations in the wiki cite actual investigation IDs and change event records. If there is no evidence, the section says so. NOFire AI does not fill in gaps.

Provenance on every dependency.

Each dependency in your catalog carries a provenance label: runtime (observed from DNS/L7 call graphs), synthesized (inferred from patterns), or intent (declared). You see exactly how confident the catalog is.

Agentic era

The developer portal AI agents can actually use.

When a human engineer hits a stale catalog entry, they lose 20 minutes. When a coding agent, deployment agent, or incident response agent hits one, it acts on it. The catalog nobody maintains is now the context layer your entire AI stack runs on.

The stale catalog problem just got a lot more expensive.

NOFire AI reads your GitHub repositories directly, including workflow definitions, contributor history, and release tags. Combined with live production signals, the result is a catalog that is accurate enough for both humans and AI systems to rely on.

README analysisCHANGELOG parsingGitHub integrationCI/CD pipeline readingContributors.mdRelease tagging

See a live demo

checkout-service · agent scanlive · 2026-06-30
entity graphcalls: product-catalog, inventory, payments [runtime, 0.9]
change events14 rollouts in 90d, 2 scaling events
prometheus rules3 SLOs, 7 alerting rules [live fetch]
metric catalog23 linked metrics, selector defined
investigations4 incidents, avg resolution 22 min
resolutions2 known mitigations [cited: INV-12, INV-23]
blast radiusfailure affects: frontend, cart, accounting [pagerank]
README.mdgRPC service, 3 downstream consumers

Catalog coverage

Every layer of service knowledge, observed automatically.

OBSERVED · DEPLOYS + CONTRIBUTORS

Service ownership

NOFire AI agents trace deploy history, on-call patterns, and contributor activity to assign ownership from evidence, not declarations.

OBSERVED · SLOS + ALERTS + ONCALL

Readiness scorecard

Four binary checks: has owner, has metrics, has alerts, is not a single point of failure. Not a score you enter. A score derived from yes/no facts about your actual system.

OBSERVED · CI/CD + INCIDENTS

Change timeline

Every deploy, rollback, and incident appears on the service timeline as it happens. No one logs it manually.

OBSERVED · LIVE TELEMETRY

Blast radius

When a service has a problem, you see exactly which downstream services are at risk, calculated from the observed call graph.

OBSERVED · TRACES + METRICS

Application map

Service dependencies traced from live telemetry. The graph reflects what production is doing right now, not a YAML file from 2022.

OBSERVED · REPOS + PIPELINES

Repository knowledge

NOFire AI agents read README files, CHANGELOG entries, contributor graphs, and CI/CD pipeline definitions to distill architectural context automatically.

OBSERVED · INCIDENT HISTORY

Runbooks and learnings

Past incident resolutions are captured and surfaced on the service page. The catalog gets smarter after every incident.

INFERRED · GRAPH + TRAFFIC

Criticality inference

NOFire AI infers service criticality from dependency depth, traffic patterns, and incident blast radius. No tier spreadsheet required.

See all capabilities in a demo

How it works

Connect once. Observe continuously.

01

Connect your production signals

Prometheus, distributed traces, deploy events, GitHub or GitLab repositories, and on-call integrations. Most teams are connected in under two hours.

02

Agents observe and synthesize

NOFire AI agents continuously read repos, trace ownership, measure readiness, and calculate blast radius from live production behavior.

03

Your catalog stays current

You read the catalog. You don't write it. As production changes and code evolves, the catalog changes with them. Automatically.

Works withPrometheusDatadogGrafanaGitHubGitLabPagerDutyAWSOpenTelemetry

Blast radius

See what breaks before the incident becomes a crisis.

The blast radius panel shows exactly which services are downstream of the affected service, calculated from the observed call graph. No topology diagrams to maintain. No Slack thread to trace dependencies.

When checkout-service fails, you see that frontend, cart, and accounting are at risk, in seconds, because NOFire AI already traced the call paths from production telemetry.

See blast radius in a demo

Blast radius panel showing downstream service impact calculated from observed call graphs in NOFire AI

Detailed comparisons

See how NOFire AI compares to each tool.

NOFire AI vs Backstage →NOFire AI vs Cortex →NOFire AI vs Compass →NOFire AI vs Roadie →

Your engineers are maintaining a catalog that is already wrong.

Book a 30-minute session and see your actual service topology, built from what is running in production right now, with no YAML required.

No commitment. Works with Prometheus, Datadog, GitHub, and GitLab.

Your service catalog is already wrong.

Your catalog is always a step behind.

Stale by design

Incidents expose it

Maintenance tax

Why teams replace Backstage with NOFire AI.

One panel. Every layer of service knowledge.

Deterministic facts. LLM-narrated prose.

Every claim cited.

Provenance on every dependency.

The developer portal AI agents can actually use.

Every layer of service knowledge, observed automatically.

Service ownership

Readiness scorecard

Change timeline

Blast radius

Application map

Repository knowledge

Runbooks and learnings

Criticality inference

Connect once. Observe continuously.

Connect your production signals

Agents observe and synthesize

Your catalog stays current

See what breaks before the incident becomes a crisis.

See how NOFire AI compares to each tool.

Your engineers are maintaining a catalog that is already wrong.

Your service catalog is
already wrong.