Cortex Knows What You Told It.
NOFire AI Knows What Is Real.

Cortex re-evaluates scorecards against metadata your engineers declared. When that metadata drifts, scorecards pass services with no real owner and unknown dependencies. NOFire AI builds the catalog from live signals: what you see is what production is actually doing.

Book a demo

NOFire.ai · Service Wiki

Service wiki

Generated from your environment · observed continuously

1 incident3 warning

Search by service name

CriticalityHealthOwner

⊞ List∿ Graph

frontend-proxywarning

TIER 2productionno owner1 gap

calls 6by 19deploy 1h ago

readiness88%

checkoutwarning

TIER 2productionbackend2 gaps

calls 9by 19deploy 6h ago

readiness44%

fraud-detectionincident

TIER 2productionno owner3 gaps

calls 3by 11deploy 5d ago

readiness31%

product-cataloghealthy

TIER 3productionbackend

calls 3by 14deploy 1d ago

readiness72%

paymenthealthy

TIER 3productionbackend

calls 1by 7deploy 3h ago

readiness90%

cartwarning

TIER 3qabackend2 gaps

calls 2by 8deploy 12h ago

readiness27%

18 services·sorted by criticality·last scan: 4s ago

Trusted by

Cortex pain points

Why Cortex customers hit a wall after the first 90 days

Catalog accuracy depends on engineers remembering to update YAML

Every service, team, and domain in Cortex requires a cortex.yaml descriptor. Ownership transfers, dependency changes, and deprecations only appear in the catalog when someone edits that file. YAML goes stale within weeks. Scorecards pass on services with phantom owners and undocumented dependencies.

4-hour scorecard refresh means you are always acting on old data

Cortex re-evaluates scorecards on a 4-hour polling cycle. A service that loses its on-call assignment, drops a Prometheus alert, or gains a new critical downstream dependency is invisible as a readiness failure until the next cycle completes. For on-call engineers investigating active incidents, that lag matters.

The total cost of Cortex developer portal ownership compounds quickly

Cortex licensing is priced per seat. Add implementation and professional services, plus a platform engineer spending 20 to 30 percent of their time on catalog hygiene. Teams under 200 engineers routinely report that the all-in cost of the Cortex developer portal exceeds efficiency gains before the platform reaches critical adoption.

NOFire AI vs Cortex

NOFire AI vs Cortex: what the catalog actually knows

Cortex catalogs from declaration, with a 4-hour lag. NOFire AI catalogs from observation, continuously. One approach has a freshness problem built in.

Capability	NOFire AI	Cortex
Catalog data source	Observed from live production: DNS, L7 call graphs, Prometheus rules, CI/CD pipelines, incident history	Declared in cortex.yaml descriptor files authored and maintained by engineers
Scorecard refresh frequency	Continuous: readiness checks update as production signals change	Every 4 hours; some integration data refreshes hourly or weekly
Dependency mapping	Inferred from observed L7 traffic with provenance labels: runtime, synthesized, or intent	Declared explicitly in YAML; undocumented dependencies are invisible
Blast radius calculation	PageRank on the live observed call graph, including dependencies from recent service changes	Derived from explicitly declared YAML dependencies; only as accurate as what engineers wrote
Ownership assignment	Inferred from deploy history, contributor activity, and on-call patterns	Set manually in cortex.yaml; drifts silently when teams change without a file edit
Readiness checks	4 binary checks from live evidence: has_owner, has_metrics, has_alerts, is_spof	Scorecard rules evaluated against declared metadata and polled integration data
Time to first value	Connect your stack, catalog appears; no YAML to write, no migration project	3-week bootcamp minimum; full catalog population and scorecard configuration takes 3 to 6 months
Ongoing maintenance cost	Near zero: agents observe continuously; no YAML files to keep current	Estimated 0.25 FTE dedicated to catalog hygiene to prevent descriptor rot

Book a demo

How it works

One panel. Every layer of service knowledge.

The service detail page in NOFire AI is populated entirely from what agents observe: entity graph, change events, Prometheus rules, incident history, and repository analysis. Nothing is declared. Nothing goes stale.

NOFire.ai · checkout · service detail

checkout

v24 · production

No SLO: No SLO / recording rule defined

Important · 1 gap

Overview

The checkout service orchestrates the end-to-end purchase flow, coordinating payment processing, inventory validation, and shipping arrangements. It acts as the central transaction coordinator, calling payment, product-catalog, cart, item validation, shipping, currency, email, kafka, and flagd.

Change timeline

deploya3f91cfeat: add circuit breaker for payment retriesm.chen
2h ago

hotfix7d204bfix: OOM crash at peak load (memory cap 20MB)p.moustafellos
4d ago

configc9e812chore: tune GC percent to reduce goroutine bloata.sapranidis
9d ago

deployf10a3drefactor: optimize product-catalog query batchingm.chen
19d ago

Observability

Scope: {service_name="checkout", k8s_namespace_name="otel-demo"}

▼ Latency

rpc_server_duration_milliseconds_bucket (histogram): server-side RPC request duration

histogram_quantile(0.99, rate(rpc_server_duration_milliseconds_bucket{service_name="checkout",k8s_namespace_name="otel-demo"}[5m]))

rpc_client_duration_milliseconds_count (counter): client-side RPC call count

rate(rpc_client_duration_milliseconds_count{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

▼ Throughput

traces_span_metrics_calls_total (counter): total span calls

rate(traces_span_metrics_calls_total{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

rpc_server_responses_per_rpc_count (counter): RPC responses per call

rate(rpc_server_responses_per_rpc_count{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

▼ Custom

go_goroutine_count (counter): number of active goroutines

rate(go_goroutine_count{service_name="checkout",k8s_namespace_name="otel-demo"}[5m])

go_config_gogc_percent (gauge): Go GC target percentage

go_config_gogc_percent{service_name="checkout",k8s_namespace_name="otel-demo"}

▼ Alerts

Service high error rate: warning, for 90s

Service high latency: warning, for 1m

Service traffic spike: warning, for 30s

When this breaks

All purchase flows halt. Checkout is the sole transaction coordinator.[INV-27]

frontend-proxy p99 latency spikes as retries queue; circuit breaker trips within 90s.[INV-26]

19 downstream services lose checkout context: fraud-detection, payment, shipping go idle.[INV-26]

Runbooks & Learnings

📄Checkout investigation: diagnose latency spikes, payment retries, and OOM events via p99 trend + goroutine countRunbook↗

📄Checkout service lacks Prometheus metrics instrumentation or scraping configuration, preventing o...Learning↗

📄Memory limit of 20MB insufficient for checkout service workload requiring 18-19MB, causing OOMKil...Learning↗

Ontology

Ownerbackend● observed

Lifecycleproduction

Criticality

ImportantTIER 2● inferred44%

Readiness

Ready · 100%

owner ✓metrics ✓alerts ✓resilient ✓

Health

No signal yet● unknown

Live health (SLO / error rate / saturation) arrives with the state engine.

Depends On

shipping

● observed

312/min

p99 18ms

● observed

198/min

p99 42ms

cart

● observed

1,840/min

p99 9ms

product-catalog

● observed

2,103/min

p99 11ms

otel-collector

● observed

async

p99 n/a

currency

● observed

876/min

p99 7ms

payment

● observed

420/min

p99 134ms

kafka

● observed

async

p99 n/a

flagd

● observed

654/min

p99 3ms

Structure

owned_bydeployment:checkout

100%

● observed

Blast Radius

accountingadcartcurrencyemailflagdfraud-detectionfrontendfrontend-proxyimage-providerkafkaload-generatorotel-collectorpaymentproduct-catalogproduct-reviewsquoterecommendationshipping

observed

Past Incidents

INV-27P1Checkout failing under payment load spike↗

resolved in 23 min

INV-26P1Checkout unresponsive after OOM kill↗

resolved in 41 min

INV-22P2ProductCatalogService intermittent UNAVAILABLE↗

resolved in 1h 12m

INV-20P2Checkout missing Prometheus scrape target↗

resolved in 55 min

INV-18P2Checkout latency p99 spike on EU traffic↗

resolved in 38 min

INV-12P3Checkout lacks alerting rule on error rate↗

resolved in 2h 4m

INV-10P3No SLO defined for checkout success rate↗

● open

INV-9P3Ownership unset: no team assigned to checkout↗

● open

Source

Production signals + repos

Manual input

None

Update frequency

Continuous

Maintenance required

Near zero

Deterministic facts. LLM-narrated prose.

The catalog structure, dependencies, readiness, and blast radius come from your system, not from an LLM. The LLM only narrates what it cannot invent: prose about what the facts mean.

Every claim cited.

Known mitigations cite actual investigation IDs and change event records. If there is no evidence, the section says so. NOFire AI does not fill in gaps.

Provenance on every dependency.

Each dependency carries a label: runtime (observed from DNS/L7 call graphs), synthesized (inferred), or intent (declared). You see exactly how confident the catalog is.

Setup

Connect your stack. Your catalog appears.

No migration project. No catalog entries to write. No plugins to configure.

01

Connect your signals

Link your observability stack, Kubernetes, CI/CD, and incident tooling. NOFire AI starts reading your entity graph and change history immediately.

02

Agents distill knowledge

Deterministic extractors build a structured skeleton: ownership, dependencies with provenance, readiness checks, blast radius. No LLM invents facts.

03

Catalog stays current

Every deploy, incident, rollback, and ownership change is reflected automatically. Engineers read the catalog instead of maintaining it.

Integrates withPrometheusGrafanaDatadogKubernetesGitHubGitLabPagerDutyLokiTempo

Also comparing

NOFire AI vs Backstage →NOFire AI vs Compass →Full comparison table →

FAQ

Switching from Cortex

How does NOFire AI compare to Cortex for production readiness scorecards?

Cortex scorecards re-evaluate every 4 hours from YAML-declared metadata. NOFire AI readiness checks run continuously against observed production facts: has_owner (inferred from deploy history and on-call), has_metrics (live Prometheus rule check), has_alerts (live alerting rule check), is_spof (inferred from dependency graph PageRank). No declarations required.

Does NOFire AI replace Cortex for service ownership tracking?

Yes. Cortex assigns ownership via the cortex.yaml owner field, which drifts when teams change. NOFire AI infers ownership from deploy history, contributor activity, and on-call rotation patterns with a provenance label (runtime, synthesized, or intent). Ownership stays current without anyone editing a file.

How do I replace Cortex without losing our existing scorecard configuration?

NOFire AI builds readiness checks from observed production facts rather than declared rules. The transition takes under 30 minutes to connect your observability stack. Existing catalog data does not need to migrate: NOFire AI discovers services and infers their state from live signals.

Is NOFire AI more affordable than the Cortex developer portal?

Cortex is priced per seat with additional professional services and implementation costs. NOFire AI also requires no dedicated catalog-hygiene headcount, which at a 0.25 FTE maintenance burden represents a significant cost reduction for most teams. Contact us for pricing details.

Replace Cortex with a catalog that reflects production, not declarations.

No cortex.yaml to maintain. No 4-hour scorecard delay. NOFire AI reads your production signals and builds a catalog your on-call can trust.

Cortex Knows What You Told It. NOFire AI Knows What Is Real.

Why Cortex customers hit a wall after the first 90 days

Catalog accuracy depends on engineers remembering to update YAML

4-hour scorecard refresh means you are always acting on old data

The total cost of Cortex developer portal ownership compounds quickly

NOFire AI vs Cortex: what the catalog actually knows

One panel. Every layer of service knowledge.

Deterministic facts. LLM-narrated prose.

Every claim cited.

Provenance on every dependency.

Connect your stack. Your catalog appears.

Connect your signals

Agents distill knowledge

Catalog stays current

Switching from Cortex

Replace Cortex with a catalog that reflects production, not declarations.

Cortex Knows What You Told It.
NOFire AI Knows What Is Real.