# Billing API Gateway Calls: Architecture for API as a Service

> Build a metering pipeline that bills API calls accurately at high volume. Architecture, batching, and pitfalls for API as a service products.
- **Author**: Ayush Agarwal
- **Published**: 2026-05-18
- **Category**: Architecture, API, Usage-based Billing
- **URL**: https://dodopayments.com/blogs/billing-api-gateway-calls-architecture

---

API as a service products live or die on the metering pipeline. The product itself is fundamentally a series of API calls. The customer pays for the calls. The cost basis is the calls. The dashboard shows the calls. Everything that matters about the business runs through this one signal, and if the metering is off by a few percent in either direction the business model breaks.

This article walks through the architecture of an API call metering pipeline that scales, survives failures, and produces billing data you can defend. The framing assumes a SaaS or AI infrastructure product, not ecommerce.

## What API call metering actually has to do

A useful pipeline has five jobs.

It captures every billable call at the gateway boundary, including method, path, status code, and response time. The capture must run inline with the request to avoid losing events, but it must not slow down the request meaningfully.

It attributes each call to a specific customer and optionally to an API key, an environment, or a feature. Attribution is the piece that makes the data billable. Anonymous calls are useful for analytics but cannot drive an invoice.

It survives high volume. API as a service products often see thousands or tens of thousands of calls per second across customers. The pipeline cannot rely on every call writing directly to the billing system. Batching and queueing are not optional.

It survives partial failures. Gateway calls succeed and fail independently. Billing API calls succeed and fail independently. The metering pipeline must reconcile these so customers are charged for what actually happened.

It supports the business model. Some products charge per call. Some charge per successful call. Some charge per call in different tiers based on endpoint cost. The aggregation layer must support whatever model the business has chosen.

A pipeline that does all five well looks like a small piece of infrastructure. A pipeline that misses any of them is an outage waiting to happen.

## The architecture, layer by layer

A clean pipeline has the same four layers as any usage based billing system, but each layer has API gateway specific considerations.

### Instrumentation at the gateway

The cleanest place to instrument is the gateway itself. Whether your gateway is a Cloudflare Worker, an AWS API Gateway with a Lambda authoriser, an envoy proxy with a custom filter, or a custom Node service, the same principle applies. After the request is routed and authenticated, before the response is sent back, capture the relevant fields and emit a billing event.

The fields to capture are the customer identifier, the API key identifier if you support multiple keys per customer, the path, the method, the response status code, and the response time. Optionally the request size, the response size, and any custom metadata that drives your pricing.

The capture must be non blocking. If the billing event emission fails or is slow, the customer should not feel it. The gateway either writes to a local queue and continues, or fires and forgets the event with explicit error handling.

### Transport at scale

A high volume gateway cannot afford a synchronous billing API call per request. The transport layer needs to batch and buffer.

The cleanest pattern is a local in memory or in process queue that holds events for a short window, plus a background flusher that batches events and submits them to the billing API. Batching reduces the per request overhead by an order of magnitude and amortises network round trips across thousands of events.

For very high volume you add a managed queue between the gateway and the flusher. Cloudflare Queues, SQS, Kafka, or Pub Sub all work. The gateway writes to the queue, a separate worker fleet drains the queue and submits batches. This separation lets you scale the gateway and the flusher independently and absorb spikes that exceed momentary billing API capacity.

The transport layer also handles retries. Transient billing API failures should be retried with exponential backoff and idempotency keys. Persistent failures should be surfaced to monitoring rather than silently dropped.

### Aggregation at the platform

The billing platform aggregates the events into meters. For API call billing the common patterns are count aggregation, which sums the number of events, and sum aggregation over a numeric property when calls have weights.

Count aggregation is the simplest and most common. Every event counts as one call. The customer pays per call.

Sum aggregation lets you assign different weights to different endpoints. A simple read endpoint counts as one. A heavy compute endpoint counts as five. The pricing stays per unit but the unit definition encodes the relative cost. This is the cleanest way to handle endpoint heterogeneity without exposing it as separate prices.

For more complex pricing some products use multiple meters. A meter for total calls, a meter for premium endpoint calls, a meter for failed calls if they are charged differently. The events go to the same ingestion API but tag with different event names that map to different meters.

### Reconciliation across boundaries

The reconciliation layer catches the gap between what your gateway logs say happened and what the billing platform says happened. Both should agree but in practice they sometimes do not.

A daily reconciliation job pulls the gateway logs for the day, aggregates them per customer and per meter, and compares against the platform totals. Discrepancies are logged and investigated.

Common causes of discrepancies include batches that failed to submit and were not retried, idempotency collisions that caused legitimate events to be dropped, gateway crashes that lost in flight events, and clock drift between gateway nodes that pushed events into the wrong day.

The reconciliation job is what makes the pipeline trustworthy over the long run. Without it, drift compounds quietly.

## High volume pitfalls

API gateway billing has some specific traps that LLM metering does not have to deal with.

The first is event volume. Even modest API products can emit billions of events per month. The pipeline has to be designed for that volume from day one, because retrofitting it after the fact is painful. Batching at the gateway, asynchronous transport, and async aggregation are the standard tools.

The second is request to event ratio. One API call usually corresponds to one billing event, but not always. Some products bill per response chunk for streaming endpoints. Some products bill per egress byte. The relationship between requests and events needs to be defined explicitly, because confusion here leads to undercounting or overcounting.

The third is failed call billing policy. A request that returns a 500 error from your service is your fault, and customers expect not to be charged for it. A request that returns a 401 is the customer's fault, and you might choose to charge or not. A request that returns a 429 from your rate limiter is also your call. The policy needs to be explicit and the gateway needs to know which calls to emit billing events for.

The fourth is customer attribution at the edge. Some traffic does not carry a clean customer identifier at the edge layer. Anonymous traffic, malformed requests, requests with revoked keys, all of these arrive at the gateway and need to be handled. They are typically not billed but need to be logged for analytics and abuse detection.

The fifth is retries from the customer side. If a customer's client retries failed requests aggressively, your gateway sees more calls than logical operations. The billing event policy needs to handle this. Some products bill every call regardless. Others bill only the successful one of a retry sequence. Both are defensible. The choice needs to be explicit and consistent.

## How Dodo Payments fits

Dodo Payments provides the aggregation layer and the events ingestion API for the transport layer. You define meters in the dashboard with count or sum aggregation, you call the events ingestion API from your gateway or worker, and the platform handles aggregation, subscription totals, overage, and invoicing.

For API gateway use cases specifically, the platform offers an API gateway blueprint that wraps the ingestion API in a high volume friendly SDK. The SDK supports batching at the gateway, automatic retry with idempotency, and a structured event shape that includes endpoint, method, status code, and response time as metadata.

The architectural advantage is the same as with LLM metering. The wrapper, transport, and aggregation are tested at scale. You wire the SDK into your gateway, point it at your meter event name, and your usage is automatically tracked. You still need the reconciliation layer and the policy decisions, but the plumbing is solved.

The full reference for the API gateway blueprint and the events ingestion API lives in the [API gateway ingestion blueprint](https://docs.dodopayments.com/developer-resources/ingestion-blueprints/api-gateway) and the [usage based billing guide](https://docs.dodopayments.com/developer-resources/usage-based-billing-guide). This article gives you the architecture. The docs give you the implementation.

## A reference architecture for an API as a service product

Putting it all together, the pipeline for a typical API as a service product looks roughly like this.

Customer traffic arrives at your edge layer, typically a CDN with a worker. The worker authenticates the request using the customer's API key, looks up the customer record, and routes the request to your origin. After the response is sent, the worker emits a billing event with the customer identifier, the endpoint, the method, the status code, and the response time. The event is written to a local queue or a managed queue.

A worker fleet consumes the queue. The workers batch events and submit them to the events ingestion API with idempotency keys. Successful submissions are removed from the queue. Failed submissions are retried with backoff. Persistent failures are surfaced to monitoring.

The billing platform aggregates events into the relevant meter. For a per call pricing model, a single count meter holds the total. For weighted pricing, a sum meter holds the weighted total. Subscriptions and quotas reference the meter. Overage triggers automatically when totals exceed quotas.

Your application reads meter totals through the platform API to show usage in your customer dashboard. The numbers shown to customers are the same numbers that drive invoices.

A reconciliation job runs daily. It pulls gateway logs and platform totals, compares them per customer, and alerts on discrepancies above a threshold. A human investigates anything that exceeds the threshold.

This architecture handles billions of calls per month with predictable cost and predictable accuracy. The components are not exotic. The discipline is what makes it work.

## Operational hygiene

A few habits keep the pipeline healthy.

Monitor batch latency. If batches are taking longer to flush than your design assumes, the queue is filling up and a backlog is forming. Catch this before it becomes a billing delay.

Monitor idempotency key collisions. Collisions usually mean a code change introduced a bug in key generation, which can cause legitimate events to be dropped.

Monitor reconciliation deltas. If the daily reconciliation report shows a growing gap between gateway logs and platform totals, something is leaking. Investigate immediately rather than waiting for end of month.

Test with chaos. Periodically inject failure into the billing API or the queue and watch what your pipeline does. Real failures will happen. Practising for them is cheap.

Document the policy on retries, errors, and weighted endpoints. Without documentation, the policy drifts as engineers come and go. With documentation, the policy is enforced and visible.

## Closing thought

API gateway metering is the kind of infrastructure that looks invisible when it works and very visible when it does not. Customers complain about bills that do not match their expectations. Finance complains about revenue that does not match traffic. Engineering complains about a pipeline that is hard to debug.

The architecture described here is not new. It is the same shape that observability and analytics systems have used for over a decade. Applied to billing, with the addition of a reconciliation layer and clear policy decisions, it produces a pipeline that scales and stays accurate. Build it this way the first time and the rest of the business runs more smoothly.

## FAQ

### Should I bill failed calls?

Policy decision. Most products do not bill 5xx errors because those are the provider's fault. Most products do bill 4xx errors because those are the caller's fault. Some products only bill successful calls to keep the model simple. Decide explicitly and document the policy.

### How do I handle endpoints with very different costs?

The cleanest pattern is a sum meter where each event carries a weight that reflects the cost. Cheap endpoints have weight one. Expensive endpoints have a higher weight. The meter sums the weights. The customer pays per unit, and the unit definition encodes the relative cost without exposing endpoint pricing as separate plans.

### What is the right batch size for the gateway?

Depends on your traffic and your latency tolerance. A common starting point is one hundred events or one second, whichever comes first. Tune from there based on observed batch latency and ingestion API throughput.

### How do I prevent a customer from racking up huge bills?

Hard caps tied to the meter total. When a customer's meter exceeds the configured cap, the gateway starts rejecting their requests with a clear error. This is not optional for API as a service products. Without caps, a single misconfigured customer client can cost both of you a lot of money in a short time.

### Can I use the same pipeline for analytics and billing?

The events feed both, but the source of truth for billing is the platform that issues invoices. Your local copy is fine for analytics dashboards. When the two diverge, the platform wins. Build the reconciliation job to alert when the gap is wider than your tolerance.
---
- [More Architecture articles](https://dodopayments.com/blogs/category/architecture)
- [All articles](https://dodopayments.com/blogs)