# Metering Object Storage and S3 Billing for SaaS

> Bill customers for storage uploads, capacity, and bandwidth across S3, GCS, and Azure Blob. Architecture for accurate object storage metering.
- **Author**: Ayush Agarwal
- **Published**: 2026-05-18
- **Category**: Architecture, Storage, Usage-based Billing
- **URL**: https://dodopayments.com/blogs/metering-object-storage-s3-billing

---

Object storage based products have a harder billing problem than most. The cost basis is bytes uploaded, bytes stored over time, bytes egressed, and operations performed against the storage layer, all of which scale independently and all of which the customer cares about differently. A backup service charges for storage. A file hosting service charges for storage and bandwidth. A media platform charges for storage, transformation, and delivery. Each product picks a different shape, but the underlying metering pipeline shares the same architecture.

This article walks through how to build a metering pipeline for object storage backed SaaS products. The framing covers AWS S3, Google Cloud Storage, Azure Blob Storage, and any compatible object store, and assumes a SaaS or AI product, not ecommerce.

## What object storage metering actually has to do

Object storage billing has more dimensions than most usage based products, which makes the architecture more interesting.

It captures upload events at the moment files enter the system. The size in bytes, the customer identifier, optionally a path prefix or bucket tag, and the timestamp.

It captures storage capacity over time. Stored bytes is not a single number but a rolling integral. A file uploaded on day one and kept for thirty days contributes to thirty days of stored capacity. The pipeline has to compute the right total.

It captures download or egress events when files leave the system. Egress is often charged differently from storage and at a different rate per region.

It captures operation counts when applicable. Some pricing models charge per request to the storage layer in addition to bytes. Get, put, list, delete, all count separately.

It survives the same retries, partial failures, and high volume conditions as any usage based pipeline. The volume can be very high if the product allows direct uploads from clients.

It supports the business model. Some products charge per gigabyte uploaded with included storage at no extra cost. Some charge per gigabyte stored per month. Some charge a tiered combination. The aggregation has to fit.

The five jobs are the same as any usage pipeline. The wrinkle is that the cost basis has multiple dimensions and the time integral matters as much as the event count.

## The architecture, layer by layer

The shape mirrors LLM and API gateway metering, with object storage specific considerations at each layer.

### Instrumentation at the application or upload boundary

The cleanest place to instrument depends on whether your product mediates uploads or hands clients direct access to storage.

If your application server handles uploads, you instrument there. After the upload completes, you read the file size and emit an event with the customer identifier, the bytes, and any metadata you care about.

If your application gives clients direct upload access through pre signed URLs or temporary credentials, you instrument either at the URL generation step using the size header from the upload, or asynchronously by listening to storage provider events such as S3 Object Created notifications.

The asynchronous pattern is the more reliable one for direct upload products. The storage provider knows definitively when an object was created and how big it is. Listening to those notifications removes a class of bugs where the application thinks an upload happened but the storage layer disagrees.

For storage capacity over time, the instrumentation is different. Capacity is not an event, it is a state that needs to be sampled or computed. Two patterns work. The first is to maintain a running balance per customer in your application database, incrementing on uploads and decrementing on deletes, and emit a daily snapshot to the meter. The second is to query the storage provider directly for total bucket size per customer, on a daily schedule. The first is more responsive. The second is more accurate.

Egress events are usually captured at the CDN or origin layer that serves the bytes. Most CDN providers expose egress data as logs or metrics that you can aggregate per customer.

### Transport and batching

The transport layer is the same as for any high volume metering. Upload events go to a queue, a worker batches them and submits to the events ingestion API with idempotency keys. The events ingestion API accepts the customer identifier, the event name, and the bytes property.

For the daily storage capacity snapshots, the transport is simpler because the volume is one event per customer per day. A scheduled job reads the running balance or queries the storage provider, and submits one event per customer with the gigabyte days for that day.

Egress events are usually batched at the source. CDN logs are aggregated per hour or per day, then fed into the same ingestion pipeline as upload events.

### Aggregation at the platform

The billing platform aggregates events into meters. For object storage there are typically multiple meters running in parallel.

A sum meter on uploaded bytes for billing per gigabyte uploaded.

A sum meter on stored gigabyte days for billing per gigabyte stored per month. The daily snapshot events feed this.

A sum meter on egressed bytes for billing per gigabyte downloaded.

A count meter on operations for billing per million operations if the model includes that.

Each meter has its own subscription quota and overage rate. Customers see line items per meter on their invoices.

### Reconciliation across systems

The reconciliation layer for object storage has to bridge three systems. Your application logs. The storage provider's own usage and billing reports. The billing platform's meter totals.

A daily reconciliation job pulls all three and compares. The storage provider report is the source of truth for capacity and operations. Your application is the source of truth for customer attribution. The billing platform is the source of truth for what was charged.

Discrepancies usually fall into a few categories. Events lost in transit between application and billing platform. Storage provider counts that include orphaned objects from incomplete multipart uploads. Customer attribution drift when path prefixes or bucket structure changes. Each category has a different fix, and the reconciliation job needs to be able to identify which is which.

## Object storage specific pitfalls

Several issues come up specifically with object storage that other usage products do not face.

Multipart upload accounting is the first. A client starts a multipart upload, uploads a few parts, and then abandons. The storage provider holds the parts in an incomplete state and charges for them. Your application may or may not count them. If you are charging per uploaded gigabyte, deciding whether to charge for incomplete uploads has to be explicit, and the policy needs to handle the storage provider's lifecycle rules for cleaning them up.

Replica and version accounting is the second. If you have versioning enabled or cross region replication, a single logical object can occupy multiple physical objects worth of capacity. Charging the customer for each replica is rarely the right call but accidentally doing so is easy. The storage provider's bucket size includes all versions and all replicas by default.

Cold storage and tiering is the third. Some products move older objects to cheaper tiers automatically. The capacity is the same but the cost basis is different. If you charge customers a flat rate per stored gigabyte, you absorb the variance, which can be margin positive or negative depending on access patterns. If you tier the customer's price too, the metering needs to know which tier each object is in.

Egress attribution is the fourth. CDN logs sometimes lack a clean customer identifier, especially for public file hosting where the URL is the only signal. The cleanest pattern is to embed the customer identifier in the URL path or in a query parameter, then parse it from the CDN log. Without this, egress billing is unreliable.

Cross region pricing is the fifth. Storage and egress prices vary by region and across regions. Customers usually want a uniform price, which means you absorb the regional variance. The metering pipeline still needs to track region per event so you can analyse the underlying cost picture.

## How Dodo Payments fits

Dodo Payments provides the aggregation layer and the events ingestion API. You define meters in the dashboard with sum aggregation over the bytes property for upload volume, storage gigabyte days, and egress bytes. You call the events ingestion API from your application, your scheduled snapshot job, and your CDN log aggregator.

For object storage specifically, the platform offers a storage ingestion blueprint that wraps the ingestion API in patterns common to S3, Google Cloud Storage, and Azure Blob Storage uploads. The SDK simplifies the event emission step and ensures the bytes property is structured correctly for the meter.

The architectural advantage is that the aggregation, subscription handling, overage, and invoicing are all handled by the platform. Your application focuses on capturing the right events and attributing them to the right customer. The reconciliation layer remains your responsibility because only your team knows the relationship between application customers and storage layer objects.

The full reference for the object storage blueprint and the events ingestion API lives in the [object storage ingestion blueprint](https://docs.dodopayments.com/developer-resources/ingestion-blueprints/object-storage) and the [usage based billing guide](https://docs.dodopayments.com/developer-resources/usage-based-billing-guide). This article gives you the architecture and the pitfalls. The docs give you the implementation.

## A reference architecture for a file hosting SaaS

Putting the pieces together, the pipeline for a typical file hosting product looks like this.

Customers upload files through your application. The application generates a pre signed URL for direct upload to S3 with a key that includes the customer identifier as a prefix. The client uploads directly to S3.

S3 emits an Object Created event for each successful upload. A Lambda function listens for these events, extracts the customer identifier from the key, reads the size from the event, and emits a billing event to the events ingestion API. The event has the customer identifier, the event name for upload bytes, and the bytes value.

A scheduled job runs nightly. It iterates over all customers, queries the storage layer for total bucket size per customer, and emits one event per customer with gigabyte days for the day. Total gigabytes times one day equals gigabyte days for that snapshot. Over a month these accumulate into the customer's stored gigabyte day total.

A separate job aggregates CDN logs hourly. It groups egress bytes by customer, parsing the customer identifier from the request path or query, and emits aggregated events to the events ingestion API.

The billing platform aggregates the three event streams into three meters. The customer's subscription has quotas for each meter. Overage applies independently per meter. Invoices show the three line items separately.

A daily reconciliation job compares the events ingested for the day against S3 metrics, CDN log totals, and the platform meter totals. Deltas above a threshold are alerted to the on call engineer.

The customer sees a usage dashboard in your product showing uploaded gigabytes this cycle, stored gigabytes today, and downloaded gigabytes this cycle. The numbers are read from the platform API and match what they will be invoiced for.

## Operational hygiene

A few habits keep object storage metering honest over the long run.

Audit storage provider bills against your meter totals at least monthly. Object storage cost is non trivial and silent margin compression is common. The audit is the only reliable way to catch it.

Tag every object with the customer identifier in metadata as well as in the key. The redundancy makes attribution more robust to key structure changes.

Run a periodic sweep for orphaned objects. Incomplete multipart uploads, abandoned temporary files, and similar artefacts accumulate over time. A weekly sweep keeps storage costs proportional to billable usage.

Document the policy on versioning, replicas, and tiering. The relationship between physical bytes at the storage layer and billable gigabytes at the customer layer needs to be explicit. Without documentation, the policy drifts.

Build the customer dashboard from the same data that drives invoices. If the customer sees one number in your product and a different number on their bill, you will spend a lot of support time explaining the gap.

## Closing thought

Object storage metering looks like a simple problem until you actually have to ship a billing pipeline for a product that lets customers upload arbitrary files. The number of dimensions, the asynchronous nature of storage events, the cost variance across regions, and the gap between physical bytes and billable bytes all combine into a real engineering exercise.

The architecture described here is not unique to object storage but each layer has storage specific considerations that catch teams who built billing pipelines for simpler products. Build the reconciliation layer early. Define the policy on incomplete uploads, versioning, and replicas before customers ever see the billing. Tag every object with customer identifiers redundantly. Audit storage costs against meter totals every month.

Get these right and an object storage backed product can scale to large volumes without billing becoming a chronic source of pain. Get them wrong and you end up with monthly fire drills that pull engineering attention away from the product.

## FAQ

### Should I bill on uploaded bytes or stored bytes?

Depends on the customer relationship. Backup and archival products usually bill on stored bytes because the value is durability over time. File transfer and short term hosting products often bill on uploaded bytes because the value is the upload itself. Many products bill on both, with included quotas in the subscription and overage for each meter independently.

### How do I track stored bytes accurately?

Daily snapshot from the storage provider is the most reliable method. Maintaining a running balance in your application is cheaper but drifts over time as application bugs and storage provider lifecycle rules diverge. The snapshot pattern is worth the daily query cost for the accuracy.

### What about delete events?

If you bill on stored gigabyte days, deletes naturally reduce the stored total at the next daily snapshot. If you bill on uploaded bytes, deletes do not affect the meter because the upload already happened. Either way, deletes are not usually a separate billable event unless your product specifically charges for delete operations.

### Can I use a single meter for all storage activity?

Possible but rarely the right call. Different activity types have different cost bases and customers usually want to see them separately. Three or four meters running in parallel is the cleaner pattern, even if the per meter setup is slightly more work.

### How do I handle customers in different regions?

The pricing presented to customers can be uniform even when underlying storage cost varies by region. The metering captures region per event so you can analyse cost. The pricing layer translates the metered usage into the customer's price regardless of which region the bytes lived in. Track the underlying margin per region so you can adjust pricing if regional variance becomes problematic.
---
- [More Architecture articles](https://dodopayments.com/blogs/category/architecture)
- [All articles](https://dodopayments.com/blogs)