# Real-Time Streaming Usage Metering for SaaS and AI Products

> Build a real-time streaming usage metering pipeline that supports live dashboards, hard caps, and instant overage. Architecture and pitfalls.
- **Author**: Ayush Agarwal
- **Published**: 2026-05-19
- **Category**: Architecture, Usage-based Billing, Streaming
- **URL**: https://dodopayments.com/blogs/real-time-streaming-usage-metering

---

Most usage based billing pipelines are designed for batch processing. Events are collected during the day, aggregated overnight, and surfaced in dashboards and invoices the next morning. This works fine for many SaaS products. It does not work for products where the customer expects to see consumption instantly, where hard caps need to engage within seconds, or where overage notifications need to fire in real time rather than at the end of the day.

Real-time streaming usage metering is the architecture that handles those cases. The events flow through the pipeline continuously rather than in nightly batches. Aggregations update within seconds. Caps and notifications engage as the customer crosses thresholds. The cost is more complex infrastructure. The benefit is a usage experience that feels live rather than reconciled.

This article walks through what real-time streaming metering actually requires, where it earns its keep, and how to build it without painting yourself into a corner. The framing assumes a SaaS or AI product, not ecommerce.

## What real-time metering actually means

The phrase real-time gets used loosely in billing. For a useful definition, three things have to be true.

The aggregation latency is in seconds, not hours. From the moment an event is generated to the moment it shows up in the customer facing usage total, the gap is short enough that customers feel the system is keeping pace with their activity. Five seconds is real-time. Five hours is not.

The cap and quota enforcement happens in the same window. If a customer is approaching a hard cap, the pipeline knows about the latest usage and can deny further requests within seconds. Caps that fire the next morning are not real-time, they are overnight enforcement.

The notification path is also real-time. When a customer crosses a quota threshold or exceeds a budget, the email or webhook fires immediately rather than at the next batch run.

If any of those three is batch, the system is not really streaming. You can have a streaming aggregation with batch enforcement, or vice versa, but the customer will feel the slower path.

## Where real-time metering earns its keep

Most products do not need real-time. Daily aggregation is enough for invoicing and most usage dashboards. Real-time metering is worth the extra infrastructure when one of the following applies.

### Hard caps protect against runaway costs

API as a service products with per-call pricing, AI products with per-token pricing, infrastructure products with per-resource pricing all face the runaway cost problem. A misbehaving customer client or a bug in their code can rack up thousands of dollars in usage in minutes. Without real-time enforcement, the customer is liable for the entire spike. With real-time enforcement, the cap engages before the spike does real damage.

### Live usage dashboards are a product feature

Some products differentiate on visibility. Customers see their usage update as they consume the product. The dashboard is a real-time view, not a delayed report. For developer tools, infrastructure, and AI products this is increasingly table stakes.

### Quota based access control

If access to features is gated by a usage quota, the gating has to be real-time or the customer experiences false rejections and false acceptances. Either the quota engages too late and lets in usage that should have been blocked, or it engages too early because the cached number is stale.

### Per-event business logic

Some products run business logic on each event as it arrives. Anti abuse checks, anomaly detection, automatic plan upgrades, real-time fraud signals. None of these work in batch.

### Multi region products

For products serving customers across regions, batch reconciliation can introduce day long discrepancies between what one region shows and what another shows. Streaming pipelines with regional fanout keep all the regions consistent within a small window.

If your product fits one of these patterns, real-time is worth the cost. If not, batch is fine and the extra complexity does not pay for itself.

## The architecture, layer by layer

A streaming metering pipeline has the same conceptual layers as a batch pipeline but each layer has different shape.

### Instrumentation that emits structured events

The instrumentation point is the same as for batch. The application or the infrastructure layer captures usage events and emits them with the relevant fields. The difference is that the events are emitted to a streaming transport rather than a buffered batch. Each event leaves the application as soon as it is generated.

### A streaming transport with strong delivery semantics

The transport for streaming is a queue or a log designed for low latency. Cloudflare Queues, AWS Kinesis, Google Pub Sub, Kafka, NATS, all work. The choice depends on operational preferences and existing infrastructure. The key requirements are low end-to-end latency, ordered delivery within a partition, and durability.

Ordering matters because aggregations that depend on the sequence of events get confused if they arrive out of order. Most pipelines partition by customer identifier so events for a single customer are ordered relative to each other, while different customers can stream in parallel.

Durability matters because losing events is losing revenue and credibility. The transport should commit events before acknowledging them, with replication across nodes or zones.

### A stream processor that maintains aggregates

The stream processor reads from the transport and maintains running aggregates per customer per meter. It is the layer that turns a stream of events into a current usage value.

Implementations vary. Stateful workers running in process. Apache Flink jobs. Dataflow pipelines. Custom Cloudflare Workers backed by Durable Objects. The pattern is consistent. Each event arrives, the processor updates the relevant aggregate, the new aggregate is persisted, and downstream consumers see the new value within seconds.

The processor also handles late events and out of order arrivals. Events that arrive after their logical window are still applied, with the aggregate updated retroactively. The cap and notification logic accounts for this so a delayed spike does not slip past enforcement.

### A real-time store that exposes current state

The aggregates need to be readable by the application and by enforcement logic on the hot path. A real-time key value store backed by something like Redis, DynamoDB, or Cloudflare Durable Objects holds the current state per customer per meter.

The store is read by the application when it needs to know if a request should be allowed. Cap enforcement reads the store, compares against the configured cap, and either allows or denies the request. The enforcement decision happens in milliseconds.

The store is also read by the customer dashboard. When the customer loads their usage page, the numbers come from this store rather than from the batch aggregate, so they reflect the latest activity.

### A reconciliation layer that aligns streaming with the source of truth

Even with streaming, there is still a source of truth for billing. The platform that issues the invoices. The streaming aggregate is a fast cache that the application uses for enforcement and dashboards. The reconciliation layer aligns the streaming view with the platform totals so the two do not drift.

For Dodo Payments based products, the reconciliation reads platform meter totals and the streaming aggregate, compares them, and corrects drift if any. In steady state the two should match. Drift is a signal that something in the streaming pipeline needs investigation.

## Implementation patterns that work

Several patterns make streaming metering reliable in practice.

Use a single source of truth for cap configuration. The cap value, the customer subscription state, and the entitlement rules should live in one place that the streaming layer reads. Spreading this across multiple systems creates drift.

Cache aggregates with explicit invalidation rather than time based expiry. If the aggregate is cached for sixty seconds, customers can briefly see stale numbers and caps can be briefly bypassed. Cache forever and invalidate on write.

Make the cap enforcement decision reversible. If a cap engages and the customer immediately raises the cap, the next request should succeed. Cap decisions baked into long running tokens are harder to reverse than decisions made at request time.

Decouple the hot path from the slow path. The application path that responds to customer requests reads from the real-time store and decides quickly. The slow path that persists events to the platform and the analytics layer runs asynchronously. The customer experience is bounded by the hot path latency, not the slow path.

Build the dashboard to read from the real-time store directly. Customers click usage and see the latest numbers. The slow path eventually reflects the same numbers but the customer never has to wait for it.

Plan for partial failures. The streaming layer can lose connectivity to the platform briefly without losing events, because events are durable in the transport. The reconciliation layer catches up when connectivity returns.

## How Dodo Payments fits

Dodo Payments provides the events ingestion API and the meter aggregation that serve as the source of truth in this architecture. The streaming layer in your application is in front of the platform, providing the low latency view that customers and enforcement logic read from.

The pattern is to emit events to your streaming transport, have the stream processor update your real-time store and also forward the events to the Dodo Payments events ingestion API. The platform aggregates the events for invoicing and slow path analytics. Your real-time store serves the hot path for dashboards and caps.

Reconciliation runs as a periodic job that compares the platform meter totals to your real-time store. In steady state the two match within a small window. Drift indicates a pipeline issue worth investigating.

The full reference for the events ingestion API and meter configuration lives in the [usage based billing guide](https://docs.dodopayments.com/developer-resources/usage-based-billing-guide). This article gives you the streaming architecture. The platform handles the billing primitives.

## Common pitfalls in streaming metering

A few mistakes show up repeatedly in real-time pipelines.

Treating the streaming aggregate as the source of truth for invoicing. The streaming layer is fast and approximate by design. The platform that issues invoices is the source of truth. Mixing these up leads to invoices that disagree with the platform records.

Underestimating event volume. Streaming systems that work at one thousand events per second can fall over at ten thousand events per second if the partitioning is wrong. Plan for ten times your current peak when sizing the transport.

Skipping the reconciliation layer. The streaming layer will drift from the platform under network partitions, deployment failures, and code changes. Reconciliation is the only way to catch this. Without it, drift compounds invisibly.

Hardcoding the streaming logic in application code. The cap and quota logic should live in a layer that can be updated independently of the application. Hardcoding it ties cap policy changes to application releases.

Letting the streaming pipeline be the bottleneck for the application. The hot path should degrade gracefully if the real-time store is briefly unavailable. Either fail open with a logged exception or fail closed with a clear error, but do not let the streaming layer take down the application.

Forgetting time zones. Streaming aggregates are usually scoped to a billing period. The boundary between periods is a real cut over. Get the time zone of the boundary right or the customer sees usage flip over at the wrong moment.

## Closing thought

Real-time streaming usage metering is not a fit for every product. Batch processing is simpler, cheaper, and good enough for most. When the product genuinely needs real-time, either because hard caps protect costs, because live dashboards are a feature, or because per-event business logic depends on it, the architecture described here scales to large products without becoming unmanageable.

The trick is to keep the streaming layer in its lane. It serves the hot path. It does not own the source of truth for billing. It feeds events to the platform that does, and it stays in alignment through reconciliation. Build it that way and the streaming layer is a clean addition rather than a parallel billing system that competes with the platform.

## FAQ

### How fast does the aggregation actually need to be?

For hard caps and live dashboards, end to end latency under five seconds is usually fine. For per-event business logic that runs synchronously with the request, sub second matters. Most applications find the right point by measuring user perception rather than chasing the lowest possible number.

### Should I build streaming metering myself or use a managed service?

Both are viable. Managed services for stream processing and real-time stores remove a lot of operational work. Self managed gives more control. The decision usually comes down to existing infrastructure and team experience. For most teams, managed services are the lower risk path.

### Can I use the same events for streaming and batch?

Yes. The cleanest pattern is to emit events to a streaming transport, then have one consumer feed the real-time store and another feed the batch platform. Both consumers read the same events. The aggregations are computed differently but use the same source data.

### How do I handle late events?

The stream processor applies them to the relevant historical aggregate. If the event lands during the same period it belongs to, it just updates the period total. If it lands after the period closed, the aggregate for that closed period gets a correction. Most platforms handle this gracefully.

### Does the cap enforcement need to be exact?

Practical caps allow a small grace window. If the cap is one million calls, it is fine for the actual cap to engage at one million one or one million ten thousand depending on how strictly you want to enforce. Tighter enforcement means more synchronous coordination and higher latency. Most products pick a tolerance of a few percent and accept the trade off.
---
- [More Architecture articles](https://dodopayments.com/blogs/category/architecture)
- [All articles](https://dodopayments.com/blogs)