yeet
10 min read

The Observability Industry Has the Wrong Customer

By: Necco Ceresani

The observability industry is worth tens of billions of dollars. It is built on a single assumption that nobody talks about: a human is the one looking at the data.

Every architectural decision in the modern observability stack follows from that assumption. You instrument ahead of time because a human can't ask for data that wasn't collected. You ship everything to a central platform because humans need a single pane of glass. You build dashboards in advance because humans can't construct views in the middle of an incident. You write alert rules in advance because humans can't watch everything. You separate metrics, logs, and traces into three pillars because humans need different views for different cognitive tasks.

This assumption was so obvious it was invisible. For fifteen years, it was correct. Datadog, Grafana, Splunk, New Relic — these are good companies that built good products around a reasonable constraint. They compete on how well they present data to humans. Better dashboards. Better query languages. Better alert builders. Better visualizations. Billions of dollars in market cap, all downstream of one design decision: optimize for a person staring at a screen.

That design decision is about to become the wrong one.


Two shifts, one inflection

Two things have happened in the same three-year window, and it is their intersection, not either one alone, that matters.

The first is that production systems got faster than humans. The shift to microservices, Kubernetes, and continuous deployment means that the thing you're observing now changes faster than you can observe it. A deploy cadence measured in minutes. A failure cascade that unfolds in seconds. A scrape interval of thirty seconds was partly a cost constraint (cardinality explosion, storage overhead, network pressure), but it was also a philosophical one, built on the assumption that the consumer of the data could only process information at human speed.

The second is that AI agents learned to reason about structured data. An agent consuming a typed data stream doesn't need any of the things the current stack was built to provide. It doesn't need pre-built dashboards, it constructs the investigation when the problem appears. It doesn't need threshold-based alerts, it reasons about what's normal. It doesn't need the three pillars separated, it can consume structured data across all of them simultaneously. It doesn't need a centralized platform to correlate across subsystems, it navigates the graph.

Either shift alone would be manageable. Systems getting faster just means you need faster tooling, shorter scrape intervals, better streaming, more real-time dashboards. Agents that can reason about data just means you bolt an AI assistant onto your existing platform and call it "AI-powered observability." Both of those are incremental moves, and the incumbents are already making them.

But the two shifts together are not incremental. They are compounding. Faster systems generate more data at higher frequency, which makes the human-as-consumer bottleneck worse, which makes the agent-as-consumer shift more urgent, which in turn reveals how deeply the current architecture assumed a human at the other end. The pre-aggregation that saved cost now destroys the granularity agents need. The dashboards that organized information for human cognition now stand between agents and the structured data they reason about natively. The static instrumentation that let humans plan ahead now prevents agents from investigating what couldn't be planned for.

Together, the two shifts don't just require faster tooling. They make the current architecture the wrong abstraction entirely.


The wrong abstraction

The modern observability stack was optimized for presentation to humans, not legibility to machines. That optimization runs deep, much deeper than the UI layer.

Dashboards are visual, not structured. They're designed to be glanced at by a person, not parsed by a program. An agent doesn't need a beautiful time-series graph. It needs typed, queryable access to the underlying data.

Alert rules are static thresholds, not contextual reasoning. "CPU above 80% for five minutes" is a rule that a human had to write in advance, covering a scenario they had to imagine in advance. An agent watching a structured data stream can notice that a Postgres container's memory has been climbing two megabytes per hour for six hours, not because someone set a threshold, but because monotonic growth is anomalous and the agent understands that.

Data is pre-aggregated to save cost, destroying the granularity that matters. When you're shipping everything to a vendor backend and paying per gigabyte, you aggregate. You sample. You roll up. You lose the per-process, per-thread, per-syscall detail that would actually tell you what's going on. The entire cost model of centralized observability incentivizes throwing away the data an agent would need most.

Instrumentation is configured in advance. This is the deepest problem. You decide what to monitor before you know what will break. You configure logging, tracing, and metrics for every service you ship. You build dashboards that attempt to anticipate every failure mode. And then when something actually goes wrong, something you didn't predict, in a way you didn't configure for, you're blind. The dashboard shows you everything you thought to ask about. It tells you nothing about the thing that's actually happening.

The value in observability is about to shift from presentation to structure. The competitive moat moves from "best UI" to "best schema." And that changes everything about how the stack should work.


First principles for a reasoning consumer

What does observability look like when the consumer can reason?

Start from first principles. If you were building an observability system today, knowing that both humans and AI agents would consume it, and that the agents would increasingly be the primary consumers, you would make five fundamental choices differently.

1. Instrumentation would be dynamic, not static. You wouldn't decide what to monitor in advance. You'd build the tool when the problem appears. A developer describing what they want to see, or an agent constructing an investigation on the fly, deploying a probe that didn't exist five minutes ago because five minutes ago nobody needed it. When the problem is resolved, the probe goes away. No residual overhead. No data accumulating in a backend. The tool exists for exactly as long as the problem does.

2. The data layer would be structured and self-describing. Not logs that have to be parsed. Not metrics that have to be interpreted. Not traces that have to be stitched together. A typed, queryable graph of system state (processes, containers, network sockets, hardware sensors, file descriptors) all interconnected and all available through a schema that serves as its own documentation. An agent doesn't need to parse ps aux output and hope the columns didn't shift. It queries the graph and gets clean, typed data back.

3. Observation would happen at the kernel, not the application. The kernel already knows what every process is doing. It knows every syscall, every network packet, every file write, every context switch. Instead of asking applications to emit data about themselves (which requires SDKs, code changes, and language-specific agents) you'd observe them from below. eBPF makes this possible: attach probes to running processes and observe their behavior directly, without modifying them, without overhead, without cooperation from the application.

4. Tools would be bespoke, not generic. Every application has different failure modes, different queries that matter, different performance characteristics. The industry's answer has been generic dashboards designed for everyone, which means they're optimized for no one. The alternative is purpose-built instrumentation: a Python profiler that parses your specific HTTP traffic, a GPU monitor that tracks your specific training workload, an agent that correlates your deploy cadence with your performance regressions. Not features on a vendor's roadmap. Tools you describe and generate.

5. The system that observes would also be able to act. Observation without actuation is a report. Observation with actuation is a control plane. If your runtime already sits in the kernel, already sees every packet and every syscall, it can do more than watch — it can make decisions. Drop a malicious request before the application sees it. Shape traffic during a cascade. Enforce a policy at the point of access, not after the fact. The same probe that detects the anomaly can be the one that responds to it. Not a separate tool. Not a separate pipeline. The same substrate, the same language, the same deploy.

These are the principles that led us to build Yeet.


What this makes possible

Yeet is a programmable kernel runtime. It's a single daemon that runs on each Linux machine, combining eBPF's kernel-depth visibility with a V8 JavaScript engine, a typed GraphQL system graph, and an agent framework. Write JavaScript. Instrument the kernel. See everything.

What matters more than the architecture is what it makes possible. Here are three scenarios that are impossible in the current model and natural in the new one.

The 3am page

It's 3am. PagerDuty fires. In the old world, an engineer wakes up, opens a laptop, stares at a dashboard, tries to remember which Grafana board has the relevant metrics, notices something looks off, starts running manual queries, and forty-five minutes later has a hypothesis about what went wrong. Maybe.

In the new model, the alert fires and simultaneously triggers an agent that's already connected to the live system graph. The agent snapshots the relevant state: CPU, memory pressure, recent deploys, process trees, network conditions. It traces the causal chain: high latency on nginx, caused by container CPU throttling, caused by a host iowait spike, caused by NVMe thermal throttling, caused by an ambient temperature rise visible in the hardware sensor data. It follows that chain across subsystem boundaries that no pre-built dashboard was designed to cross. By the time the engineer wakes up and checks their phone, there's a structured investigation summary waiting in Slack alongside the page. Not a bare threshold notification. A situation report with root cause analysis and a proposed remediation.

No human is correlating across containers, kernel stats, hardware sensors, and network interfaces simultaneously at 3am. No dashboard was pre-built for that specific causal chain. But an agent navigating a structured, typed, streaming graph of system state can follow it in seconds.

The bespoke profiler

You suspect a performance regression buried somewhere in a service that gets deployed ten times a week. In the old world, you set up a profiler, configure it for your language, deploy it alongside your application, wait for data to accumulate, pay for the storage, and hope that the profiling agent's overhead doesn't distort the results.

With Yeet, you describe what you need and AI writes it: a bespoke production profiler with function-level profiling, endpoint latency tracking, slow query analysis, and TCP retransmit correlation — built as a Yeet Script in under two hours. Not a generic APM dashboard. A tool purpose-built for that specific service, running at the kernel level across any language, compiled or interpreted. When you're done, you stop the script. No residual overhead. No data bleeding into a vendor backend. The tool existed for exactly as long as you needed it.

This points to something deeper than speed-to-value. Every improvement to AI code generation makes the platform more valuable without anyone shipping a new feature. The incumbents ship fixed products to every customer. A programmable runtime ships the abstraction that AI targets — and gets better every time AI gets better.

The agent that understands

This one points toward where it all leads: an agent that doesn't just respond to incidents but understands your infrastructure the way you understand your own house. It knows what's normal on Tuesdays. It knows that Postgres gets busy after the nightly batch job. It knows that the network sensor runs hot when the office is crowded. It doesn't alert on a threshold. It alerts on a deviation from understood behavior. "Your Postgres container's memory has been climbing two megabytes per hour for the last six hours, which is new. Want me to investigate?" That's not a threshold. That's understanding.


Where the line is

I want to be honest about where the line is. This is not a story about replacing Datadog tomorrow. Humans still want to look at things. Teams still need shared dashboards. Compliance still needs audit trails. Historical storage still matters. The incumbents are good at what they do, and what they do remains valuable.

But the center of gravity is shifting. The value is moving from presentation to structure. The practice of understanding your systems is the same thing it always was, it just has a much more capable reader.

We think infrastructure tools should be written, not bought. We think every application deserves its own observability. And we think the first platform where AI agents can read the kernel is the foundation that everything else gets built on.

That's what we're building at Yeet.

Get early access to Yeet

Join the waitlist and be first to know when we launch.