# Claude Token Cost Explained: How Anthropic Charges for Sonnet, Opus, and Haiku

> Claude token cost explained for AI founders. How Anthropic charges per model, what makes tokens expensive, and how to estimate real per-request cost.
- **Author**: Ayush Agarwal
- **Published**: 2026-06-12
- **Category**: AI, Pricing, Developer Tools
- **URL**: https://dodopayments.com/blogs/claude-token-cost-explained-2026

---

Claude token cost is the per-token price Anthropic charges for sending text into and getting text out of their Claude model family. It is the dominant variable cost for any AI product built on Claude, and understanding how it scales is essential before you ship customer pricing.

This guide explains how Anthropic charges, what makes tokens expensive in real-world workloads, and how to estimate per-request cost so your pricing math holds up at scale. For a deeper look at margin design (how to price customers on top of these costs), see the [Anthropic Claude API pricing margin guide](https://dodopayments.com/blogs/anthropic-claude-api-pricing-margin).

## Token basics

A token is roughly a chunk of text. For English, the rough conversion is:

- 1 token is about 4 characters
- 1 token is about 0.75 words
- 1,000 tokens is about 750 words
- A page of dense English text is about 500 tokens

Different languages tokenize differently. Code, structured data, and non-English languages typically use more tokens per equivalent content. Always estimate token counts on your actual content, not on rough word-count heuristics.

## How Anthropic charges

Anthropic charges separately for **input tokens** (what you send to the model, including system prompt, user messages, and conversation history) and **output tokens** (what the model generates back). Output tokens are typically much more expensive than input tokens (often 3x to 5x the input rate).

Prices are quoted per million tokens (MTok). A model priced at $3 per million input tokens means $3 per 1,000,000 input tokens, or $0.000003 per input token.

For current rates, always check Anthropic's official pricing page directly. Rates change as Anthropic releases new models and as competitive dynamics shift.

## The Claude model tiers

Anthropic offers Claude in multiple performance and price tiers. The general pattern (subject to change as new models release):

| Tier | Use Case | Relative Cost |
|---|---|---|
| Haiku (smallest, fastest) | Lightweight tasks, high volume | Lowest |
| Sonnet (mid-tier) | General-purpose, most production AI products | Mid |
| Opus (largest, most capable) | Complex reasoning, agentic workflows | Highest |

The exact model names and price points shift over time as Anthropic releases new versions. The tiering principle stays consistent: smaller and faster is cheaper, larger and more capable is more expensive.

Sonnet is the most common production choice for AI SaaS because it balances cost and capability. Opus is reserved for tasks where the extra reasoning is genuinely needed (complex agents, long-form analysis). Haiku is for high-volume, low-complexity tasks (classification, simple extraction).

## What makes tokens expensive in practice

The per-token rate is one variable. Three others usually dominate:

### 1. Context length

If your app sends a long system prompt and conversation history with every request, you pay for all of it on every call. A 10,000-token context window used on every request to Sonnet at typical rates costs roughly $0.03 per request just for the input. Aggressively trimming context (only sending relevant history) directly reduces cost.

### 2. Output length

Output tokens are 3x to 5x the input rate. A response that returns 2,000 tokens of generated text costs more than a 10,000-token input. Constraining the model to short outputs (via max_tokens, stop sequences, or prompt engineering) directly lowers cost.

### 3. Retry and re-prompting patterns

Any architecture that retries failed responses, re-prompts with different formulations, or uses multi-pass generation multiplies the per-task token cost. A "retry once if confidence is low" pattern doubles the cost on the retried subset.

### 4. Caching opportunities

Anthropic offers prompt caching that reduces the cost of repeated prefixes. If your system prompt is the same across thousands of requests, prompt caching can cut the input cost on those repeated portions substantially. Caching works best for stable, high-volume prefixes.

> The headline per-token rate is rarely what kills margin. It is the context bloat, the verbose outputs, and the retry loops that compound 5x over the simple-looking unit cost.
>
> \- Ayush Agarwal, Co-founder & CPTO at Dodo Payments

## Estimating per-request cost

A practical framework: for each request your app makes, estimate the tokens in each component and multiply by the rate.

### Example: a customer support assistant

Let's walk through an illustrative per-request cost using Sonnet-tier rates. (Substitute current published rates from anthropic.com for accuracy.)

- System prompt: 500 tokens (cached after first use)
- User message: 200 tokens
- Conversation history: 1,500 tokens
- **Total input: 2,200 tokens**
- Output: 400 tokens

At illustrative Sonnet-tier rates of $3/MTok input and $15/MTok output (these numbers are for arithmetic demonstration only and Anthropic's published rates have changed multiple times since 2024; always pull the current rate from anthropic.com before doing real margin math):

- Input cost: 2,200 tokens x $3 / 1,000,000 = $0.0066
- Output cost: 400 tokens x $15 / 1,000,000 = $0.006
- **Total: $0.0126 per request**

If your product makes 100 requests per active user per day, that is $1.26 per active user per day, or $37.80 per month. If you charge $20/month per user, the math does not work without aggressive optimization.

This is the calculation every AI SaaS founder has to do before setting customer pricing. Token math first, pricing second.

## Cost optimization tactics

### 1. Smaller models for routine tasks

If 80% of your requests are simple (classification, extraction, summarization), route them to Haiku and reserve Sonnet for the 20% that need it. This can cut total cost 3x to 5x with no quality loss on the simple cases.

### 2. Aggressive context trimming

Conversation history grows unbounded by default. Trim to the last N relevant turns, summarize older history, or use retrieval to pull only relevant context. A 50% reduction in average context length directly reduces input cost by 50%.

### 3. Output length constraints

Use max_tokens to cap output length. If your UI displays at most 200 tokens of response, do not let the model generate 1,000. Pair this with prompt-level instruction ("respond in one paragraph") for best results.

### 4. Prompt caching for stable prefixes

If your system prompt is identical across users, cache it. Anthropic's prompt caching can substantially reduce repeated input costs for stable prefixes.

### 5. Streaming for perceived latency, not cost

Streaming responses doesn't reduce cost (you pay for all generated tokens regardless), but it improves perceived latency. Pair it with shorter outputs for actual cost reduction.

### 6. Batching for non-real-time work

If a workload can tolerate batch processing (overnight summarization, bulk classification), Anthropic offers batch processing at a discount. For non-interactive workloads, this is a meaningful saving.

### 7. Model fallback patterns

For agentic workflows, use Sonnet for the first attempt and fall back to Opus only if Sonnet's confidence is low or the task is complex. Most requests get the cheaper model; only the hard ones pay the premium.

## Customer pricing patterns

How do you charge customers on top of Claude token costs?

### Flat subscription with usage caps

The simplest pattern: charge $X per month, limit users to N requests per month. Cap the cost exposure per user. Best for predictable workloads.

### Credits / usage-based

Sell credits that map to requests or tokens. Customer consumes as they use. Best for variable workloads where some users use 10x the average.

### Hybrid

Base subscription with credit bundle included, overage charged per unit. Most flexible. Common pattern in production AI SaaS. Requires solid metering infrastructure.

For SaaS implementing any of these patterns, Dodo Payments supports subscriptions, credits, [usage-based billing](https://docs.dodopayments.com/features/usage-based-billing/introduction), and hybrid models natively. See [Dodo Payments](https://dodopayments.com).

## Monitoring token costs in production

Three metrics every AI SaaS should track:

1. **Tokens per request (input and output, separately)**: rising is bad
2. **Requests per active user per day**: baseline for capacity planning
3. **Token cost per dollar of revenue**: the most important unit economic for AI products

If "token cost per dollar of revenue" creeps above 25% to 30%, your margin is eroding. Investigate why (context bloat, more verbose outputs, retry loops, customer behavior change).

## FAQ

### How much does the Claude API cost?

Anthropic charges per million tokens, separately for input and output. Rates vary by model tier: Haiku is the lowest, Sonnet is mid, Opus is the highest. Output tokens cost 3x to 5x the input rate. Always check current Anthropic pricing for exact rates.

### How many tokens are in a typical sentence?

A sentence of English text is roughly 15 to 30 tokens. A paragraph is 50 to 150 tokens. A page of dense text is around 500 tokens. Non-English text and code typically use more tokens per equivalent content.

### Are input or output tokens more expensive?

Output tokens. Anthropic charges roughly 3x to 5x more per output token than per input token across the model family. This means long generated responses cost much more than long input contexts.

### How can I reduce Claude API costs?

Use smaller models for routine tasks, trim conversation history, cap output length, cache stable system prompts, batch non-real-time work, and use model fallback patterns where only complex requests use the more expensive models.

### How should I price customers on top of Claude API costs?

Three common patterns: flat subscription with usage caps, credit-based usage billing, or hybrid (base subscription with included credits plus overage). For variable workloads, usage-based billing aligns customer cost with delivery cost.

## Conclusion

Claude token cost is the dominant variable cost for AI SaaS. Understanding how it scales (context, output, retries) is essential before setting customer pricing. Smart routing, context management, and prompt caching can cut cost by 3x to 5x with no quality loss.

For SaaS implementing subscription, credit, or usage-based billing on top of AI workloads, [Dodo Payments](https://dodopayments.com) supports all three models natively. See [pricing](https://dodopayments.com/pricing).
---
- [More AI articles](https://dodopayments.com/blogs/category/ai)
- [All articles](https://dodopayments.com/blogs)