# Introducing Dualmark: The AEO Infrastructure Your Marketing Site Is Missing

> Your blog ranks #1 on Google. ChatGPT cites your competitor. We built Dualmark to fix that - an open-source AEO toolkit with a public spec, six packages, a verifier CLI, and adapters for Astro, Next.js, and Cloudflare Workers.
- **Author**: Ayush Agarwal
- **Published**: 2026-05-12
- **Category**: Open Source
- **URL**: https://dodopayments.com/blogs/dualmark-aeo-infrastructure-open-source

---

Last quarter we ran a small experiment. We typed "merchant of record for SaaS" into ChatGPT, Claude, and Perplexity. Then "best Stripe alternative for global subscriptions." Then "how does Dodo Payments handle tax compliance."

In about a third of the answers, our competitors were cited and we were not. In another third, we were cited but with a wrong fact - "Dodo Payments doesn't support subscriptions" was a recurring favorite, despite having shipped subscription billing more than a year ago.

The blog posts that should have answered those questions were already there. They ranked on Google. Our team had spent months writing them. The problem wasn't the content. The problem was that AI search engines couldn't read it.

That's the gap we've been quietly filling on our own marketing site for the past year. Today we're open-sourcing the whole stack as **Dualmark** - a public AEO specification plus a working reference implementation. Apache 2.0. Six packages on npm. A verifier CLI. Adapters for Astro, Next.js, and Cloudflare Workers. The repository lives at [github.com/dodopayments/dualmark](https://github.com/dodopayments/dualmark) and the docs are at [dualmark.dev](https://dualmark.dev).

This post is the story of why we built it, what we got wrong along the way, and how it ended up as a spec instead of just a library.

## The Problem AI Search Engines Have With Marketing Sites

If you've shipped a marketing site in the last five years, it probably looks like this: a static or hybrid framework (Next.js, Astro, Remix), some React islands for interactivity, a CDN in front, an analytics tag or three, and a cookie banner. From a human's perspective, it loads fast and looks polished.

From an AI crawler's perspective, the same page is a mess.

```
What a Marketing Page Looks Like to Different Clients
═══════════════════════════════════════════════════════════════════

Human Browser                          AI Crawler
─────────────                          ──────────
Load HTML                              Load HTML
Execute JavaScript                     No JS execution
Hydrate React components               See empty <div> placeholders
Render CSS                             Ignore CSS
Dismiss cookie banner                  See cookie banner as content
Read the article                       Read nav + footer + JSON-LD scraps
                                       + maybe some article text
                                       + cookie banner copy
                                       + analytics noise

Effective signal-to-noise:
  Human:        ~95% useful content
  AI Crawler:   ~15% useful content (and dropping)
```

We saw this in our own access logs. GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and a dozen others were hitting the site daily. They were getting back HTML built for browsers. They were extracting whatever they could and discarding the rest. The "whatever they could" was usually a few headings, fragments of the body text, and a healthy dose of navigation chrome.

This isn't a bug in any single crawler. It's the natural consequence of asking language models to read HTML. HTML is a presentation format. It was designed for browsers, not for LLMs. Every AI vendor has to write their own extraction logic, and they all get it subtly wrong in different ways - which is why the same page produces different facts in ChatGPT versus Claude versus Perplexity.

We tried the obvious fixes first. More server-side rendering. Less JavaScript on content pages. Better JSON-LD. Cleaner HTML structure. Each one helped a little. None of them fixed the underlying problem: **we were optimizing the wrong format.**

## The Mental Shift

The breakthrough was embarrassingly obvious in retrospect.

We stopped asking "how do we make our HTML easier for AI to parse?" and started asking "what do AI systems actually want to read?"

The answer is markdown. Always markdown. Every documentation tool that's been adopted by LLM teams - llms.txt, Anthropic's docs, Cloudflare's docs, Mintlify - converged on the same idea. Language models were trained on text. They want text. Clean, structured, well-anchored text with headings and links and no surrounding furniture.

So we built a parallel representation. For every HTML page on our marketing site, we built a markdown twin. Same content. Same URL, with a `.md` suffix. Served at the edge, picked automatically based on who's asking.

```
Old Mental Model: One Page, One Format
═══════════════════════════════════════════════════════════════════

  /pricing
     │
     ▼
  ┌──────────────────────────────────┐
  │ HTML (humans)                    │
  │ HTML (bots, fingers crossed)     │
  │ HTML (everyone, somehow)         │
  └──────────────────────────────────┘

New Mental Model: One URL, Two Representations
═══════════════════════════════════════════════════════════════════

  /pricing                          /pricing.md
     │                                  │
     ▼                                  ▼
  ┌──────────────────┐              ┌──────────────────┐
  │ HTML for humans  │ ◄── Link ──► │ Markdown for AI  │
  │ Full layout      │  rel=alternate│ Clean text       │
  │ Hydrated React   │              │ Structured       │
  │ Cookie banner    │              │ Nav-free         │
  └──────────────────┘              └──────────────────┘
         ▲                                  ▲
         │                                  │
   Accept: text/html               Accept: text/markdown
                              or  User-Agent: GPTBot
```

This isn't a new idea in HTTP. RFC 7231 has had content negotiation since 2014. The `Accept` header has been there since 1996. We were just using it for something nobody had bothered to formalize yet: serving the right format to the right client when the client happens to be a language model.

We called this pattern "dual-marking" - every page exists twice, picked once. The naming was deliberate. Internally it had been "dual-format serving" and "the markdown twin pattern" and "that thing the Worker does for bots." None of those were going to fit on a homepage.

## What We Built First (At Dodo Payments)

The first version of this was a one-off implementation in our own website repo. A Cloudflare Worker, some Astro page endpoints that emitted markdown, and a hand-maintained list of bot User-Agents. We wrote about that in detail in [How We Serve Markdown to AI Agents at the Edge](https://dodopayments.com/engineering/serving-markdown-ai-agents-edge).

That post got a lot more attention than we expected. Within a few weeks we had developers from other companies asking us:

- "Can you share the bot detection list?"
- "How do you handle redirects for bots?"
- "What headers should I be setting?"
- "Does this work on Next.js?"
- "Is there a way to verify if my site is doing this right?"

Each question was a different sliver of the same problem. There was no library to point at. There was no spec to link to. There were a few blog posts (including ours), a few internal implementations at other companies, and a lot of "we built our own thing and we think it's right."

That's the moment we decided this needed to be infrastructure, not a one-off.

## What's In The Box

Dualmark is six npm packages, one CLI, one public spec, and a handful of working examples. Here's the layout:

```
dualmark/ (Apache 2.0 - github.com/dodopayments/dualmark)
═══════════════════════════════════════════════════════════════════

  spec/                          <- Public AEO Specification v1.0
    ├── README.md                  RFC 2119-compliant, framework-agnostic
    ├── content-negotiation.md     Accept-header parsing rules
    ├── ai-bot-detection.md        Registry of 19 known bots
    ├── headers.md                 Required + recommended response headers
    ├── discovery.md               Link header, llms.txt, sitemap rules
    ├── conformance.md             Basic / Standard / Advanced levels
    └── llms-txt-extensions.md     Extensions on top of llmstxt.org

  packages/
    ├── @dualmark/core             14 KB, zero runtime deps
    │                              Content negotiation, bot detection,
    │                              markdown response builder, token
    │                              estimation, composition helpers
    ├── @dualmark/converters       16 KB - 12 production page-type
    │                              converters (blog, glossary, compare,
    │                              pricing, changelog, pseo, ...)
    ├── @dualmark/astro            22 KB - Astro 5 integration
    │                              Auto-generates .md endpoints + llms.txt
    ├── @dualmark/nextjs           15 KB - Next.js App Router adapter
    │                              Middleware + route handlers + static gen
    ├── @dualmark/cloudflare       9 KB  - Workers edge adapter
    │                              Wraps any upstream Worker
    └── @dualmark/cli              16 KB - dualmark verify <url>
                                   0-125 conformance score

  examples/
    ├── astro-blog                 80/80 under astro dev
    ├── astro-cloudflare-full      125/125 under wrangler dev
    └── nextjs-app-router          120/125 under next dev

  apps/
    └── docs                       dualmark.dev (Fumadocs)
                                   Includes /play - interactive
                                   Accept-header + UA tester
```

The split between "spec" and "packages" is deliberate. The packages are the reference implementation. The spec is the contract. Anyone can implement Dualmark in any language, any framework, any runtime, and a `dualmark verify` run will tell them whether they got it right.

## The Spec, And Why We Wrote One

The temptation when shipping a library is to just ship the library. We thought about that. We rejected it for one specific reason.

Every team we'd talked to had built a slightly different version of the same thing. Different bot lists. Different headers. Different ways of advertising the markdown twin. Different URL conventions (some used `/page.md`, some used `/md/page`, some used a Vary on User-Agent, some used Accept). None of them were wrong. None of them were compatible.

If we shipped a library, we'd just be adding a seventh dialect.

So we wrote the spec first. **AEO Specification v1.0**, RFC 2119 keywords, structured as eight markdown documents under `spec/`. It defines:

- How servers MUST parse `Accept` headers (referencing RFC 7231)
- The required response header set (`Content-Type`, `Vary: Accept`, `X-Markdown-Tokens`, `X-Robots-Tag: noindex`)
- The canonical `.md` twin URL convention
- A normative AI bot registry with 19 entries (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, and the rest)
- Discovery via `Link: <url.md>; rel="alternate"; type="text/markdown"`
- Three conformance levels (Basic 60%, Standard 80%, Advanced 95%)
- A weighted check catalogue with exact pass criteria

We were deliberately careful about what the spec is and isn't. It's not an IETF Internet Standard. There is no working group. We say this on the first page, in bold, with footnotes. What the spec is is a coherent set of conventions built on top of pieces that are already standardized - HTTP content negotiation, the `text/markdown` media type, the `noindex` directive - so that if and when an official spec emerges, the migration path is short.

The conformance levels matter. They give teams a target instead of an absolute. A small site can ship Basic conformance in an afternoon and still be measurably better off. A large site can chase Advanced over a quarter and have a number to show for it.

## The Verifier

The thing that ties the spec and the packages together is the CLI. `bunx @dualmark/cli verify <url>` fetches a page, runs every check in the conformance catalogue, and prints a 0-125 score.

```
$ bunx @dualmark/cli verify https://yourcompany.com/pricing

Dualmark Conformance Report
═══════════════════════════════════════════════════════════════════
URL:         https://yourcompany.com/pricing
Markdown:    https://yourcompany.com/pricing.md
Score:       125/125
Duration:    107ms

Passed:
  [+20] md.fetch          Markdown twin URL is reachable
  [+10] md.contentType    Content-Type is text/markdown; charset=utf-8
  [+10] md.tokensHeader   X-Markdown-Tokens header is present
  [+10] md.noindex        X-Robots-Tag includes noindex
  [+10] md.vary           Vary header includes Accept
  [+10] md.body           Body is non-empty markdown
  [+10] html.linkAlternate HTML response advertises markdown twin
  [+10] negotiation.botUa  GPTBot UA receives text/markdown
  [+10] negotiation.acceptHeader Accept: text/markdown receives markdown
  ...
```

The verifier doesn't care which library you used. It doesn't care which framework you're on. It cares about what your server actually does over HTTP. You can drop it into CI:

```yaml
- run: bunx @dualmark/cli verify https://staging.yourcompany.com/pricing
  # exits non-zero if any required check fails
```

This is the part we wish had existed when we started. We spent weeks debugging "is the bot getting the right thing?" with `curl -H "User-Agent: GPTBot" ...` and eyeballing response headers. Now there's a single command that produces a structured report.

## The Adapters

The reference implementation is split across three frameworks because those are the three we'd shipped this on internally or for friends:

**Astro.** Drop the integration into `astro.config.mjs`, declare which collections should have markdown twins, get auto-generated `.md` endpoints and an `llms.txt` for free.

```ts
// astro.config.mjs
import { defineConfig } from "astro/config";
import dualmark from "@dualmark/astro";

export default defineConfig({
  site: "https://yourcompany.com",
  integrations: [
    dualmark({
      siteUrl: "https://yourcompany.com",
      collections: {
        blog: { converter: "blog" },
        glossary: { converter: "glossary" },
      },
      llmsTxt: { enabled: true },
    }),
  ],
});
```

**Next.js App Router.** A middleware that handles content negotiation plus a catch-all route handler that emits markdown from your data layer. Works with `next dev`, works with `output: "static"`, works behind a CDN.

**Cloudflare Workers.** Wraps an existing Worker. AI bots get markdown from the edge in single-digit milliseconds. Hooks for analytics so you can tell which bot visited which page.

The split between core and adapters is intentional. `@dualmark/core` has zero runtime dependencies. It exposes primitives - `parseAcceptHeader`, `detectBot`, `buildMarkdownResponse`, `renderLlmsTxt` - that any framework can wire up. If you're on SvelteKit or Remix or Hono and you want to ship Dualmark today, you can build a 50-line adapter against the primitives without waiting for us.

## The Converters

The thing we underestimated when we open-sourced this was the converter problem.

It's one thing to say "serve markdown for every page." It's another to actually emit good markdown for a pricing page versus a glossary entry versus a competitor comparison versus a programmatic SEO landing page. Each of those has a different structure that AI agents need to read differently.

So we extracted the converter factories we'd been using internally and shipped them as `@dualmark/converters`. Twelve of them, covering the page types every marketing site has:

```
@dualmark/converters - 12 production converter factories
═══════════════════════════════════════════════════════════════════

  blog          Long-form posts          Engineering blog, customer stories
  case-study    Customer wins            Logos with stats and pull-quote
  changelog     Release notes            "What's new in v1.4"
  compare       Us vs. competitor        "Stripe alternative" pages
  docs          Documentation            Getting started, API guides
  feature       Product/feature pages    "Webhooks", "SSO"
  glossary      Term definitions         "What is a payment gateway?"
  legal         Policy pages             Terms, Privacy, DPA
  pricing       Pricing tables           Tier comparison with CTAs
  pseo          Programmatic SEO         "SEO services in San Francisco"
  tool          Standalone calculators   "Currency converter"
  video         Video landing pages      Webinar replays
```

Each converter is a factory. Give it a `siteUrl` and a `basePath`, get a function that takes one of your data records and returns clean markdown with the right structure for AI consumption - title, description, breadcrumbs, FAQ extraction, related links at the bottom.

```ts
import { compareConverter } from "@dualmark/converters";

const convert = compareConverter({
  siteUrl: "https://yourcompany.com",
  basePath: "/compare",
});

const md = convert(yourComparePage);
```

If your data shape doesn't fit any of the twelve, you write your own. The core package gives you the primitives. If your data shape almost fits, you wrap one of the bundled converters and override the parts you care about.

The converters are the part where Dualmark gets opinionated. The spec is neutral about markdown structure. The converters bake in a year of "what do AI systems actually quote well?" - lead with the title and a one-sentence description, breadcrumb context near the top, FAQ as a flat `Q: / A:` block, related-content links as a list at the end. None of that is novel. It's just what worked when we read back the citations.

## The Implementation Journey

We didn't ship Dualmark in one shot. The extraction-to-OSS path had three phases, and each one taught us something we didn't expect.

### Phase 1: Extract The Core (Three Weeks)

The first job was getting `@dualmark/core` out of our website repo without dragging the rest of the website with it. The Worker code had grown organically. It imported from our content collections, our markdown utilities, our analytics helpers, our redirect map.

We did this the boring way: copy the relevant files into a fresh monorepo, delete everything that didn't compile, fix the imports, write tests for every primitive in isolation. The first PR had 167 unit tests (using vitest plus fast-check for property tests) and zero runtime dependencies. That zero-dep constraint was hard to hold, but it's what makes the package safe to drop into anything.

The unexpected discovery: most of the code we'd written was framework-specific glue. The actual reusable primitives - parse Accept header, match bot UA, build markdown response, estimate tokens - were a few hundred lines. Once we untangled them from Astro and Cloudflare specifics, the framework adapters became thin.

### Phase 2: Write The Spec (Two Weeks)

Writing the spec was slower than writing the code, because writing a spec forces you to admit every assumption.

We discovered our internal implementation was slightly non-conformant with the spec we were writing. We weren't setting `Vary: Accept` on HTML responses (we were only setting it on markdown responses). We weren't returning `406 Not Acceptable` when a client explicitly asked for an unsupported format - we were falling back to HTML. We were using a Vary on User-Agent without documenting when.

Each of those was a bug we'd been carrying for months and never noticed because no client was strict about it. Writing the spec made us write a stricter implementation, which made the verifier catch our own regressions.

### Phase 3: Adapters and Examples (Four Weeks)

The adapters were the longest phase because they have to work in everyone else's project, not ours. Astro was easy because we'd already shipped it for ourselves. Next.js was hard - the middleware-versus-proxy distinction in Next 15 changed mid-development and we had to support both. Cloudflare was medium - wrapping an existing Worker is conceptually simple but the surface area for "what shape can the upstream Worker be?" is large.

The examples were the part that surprised us. We assumed they were a finishing touch. They turned out to be the most valuable artifact in the repo. The full Astro + Cloudflare example is `125/125` under `wrangler dev`, end-to-end, with real content. When someone asks "how do I do X?", we point at the example instead of writing prose.

## The Results, So Far

Dualmark has been running on `dodopayments.com` for the better part of a year - first as an internal implementation, now via `@dualmark/cloudflare` against the same packages we shipped publicly. Some observations:

- **AI citations improved.** ChatGPT, Claude, and Perplexity now correctly describe subscription billing, credit-based billing, and merchant-of-record features. We spot-check this weekly with a list of representative queries. The "Dodo Payments doesn't support subscriptions" hallucination hasn't reappeared in six months.
- **Conformance score: 125/125.** We run `dualmark verify` in CI against staging. Regressions get caught before production.
- **Bot traffic is visible.** Via the `onAIRequest` hook plus Cloudflare Analytics Engine, we can see which bot visits which page, from which country, with what token consumption. This data is what makes it possible to know where coverage is good and where it's missing.
- **Performance impact is essentially zero for humans.** The edge Worker adds under 1ms of latency for non-bot requests. Bot detection is a substring check, not a regex.
- **Across the monorepo: 313 tests, 6 packages, 26 prerendered doc routes.** All green on `bun run build && bun run test && bun run typecheck`.

The number we can't put on this yet is "did it move the needle on AI-driven traffic." AI search engines don't expose impression data the way Google Search Console does. We have indirect signals (citations spot-checks, referral traffic from `chat.openai.com` and similar, qualitative reports from users who say "ChatGPT pointed me here") but no clean metric. We expect that to change as the AI search ecosystem matures.

## Should You Use Dualmark?

**Makes sense if:**

- You have a marketing site or content site with more than a handful of pages
- You can already see AI crawlers in your access logs (`GPTBot`, `ClaudeBot`, `PerplexityBot`, etc.)
- You care about how AI search engines represent your product
- You're on Astro, Next.js, or Cloudflare Workers - or willing to write a thin adapter against `@dualmark/core`

**Stick with what you have if:**

- You have fewer than 20 pages - the operational overhead isn't worth it yet
- You have no AI crawler traffic and no expectation of any (rare in 2026, but possible)
- Your content changes hourly and you can't bear the rebuild cost of static markdown
- You already have a working internal implementation and you'd rather not migrate (in which case: at least run `dualmark verify` against your site and let us know what's missing)

The honest version: this is the kind of infrastructure that's invisible when it works. You won't see your CSAT go up. You'll see fewer Slack messages from your marketing team that say "ChatGPT said something wrong about us again." That's the bar.

## Where It Goes From Here

We're treating v1.0 as the starting point, not the finish line. The roadmap, in roughly priority order:

- **More framework adapters.** SvelteKit, Remix / React Router, Nuxt. The core primitives are framework-free, so these are bounded-effort.
- **More edge adapters.** Vercel, Netlify, Fastly Compute, Deno Deploy. Wrap-the-upstream pattern should generalize.
- **More converters.** The current twelve cover the common marketing page types. We expect to add at least changelog, integrations, status pages, and API reference based on what we're seeing in our own pipelines.
- **AEO Analytics.** A hosted dashboard on top of the `onAIRequest` hook so marketing teams can see which bot reads which page, when, without setting up Cloudflare Analytics Engine themselves.
- **Spec evolution toward v1.1.** Structured data hints in markdown, per-section anchors for AI agents that want to quote a specific paragraph, a `sitemap.md` companion to `sitemap.xml`.
- **CMS integrations.** Sanity, Contentful, Builder.io plugins so non-engineers can author dual-marked content.

The spec is the contract. We expect implementations in other languages - Go, Rust, PHP, Ruby - to show up before we get to all of those. When they do, we'll link to them.

## Key Takeaways

1. **AI search engines aren't reading your HTML correctly, and there's no fix at the HTML layer.** Ship a markdown twin for every page. Pick which to serve via content negotiation.

2. **HTTP already has the primitives.** `Accept`, `Vary`, `Link rel="alternate"`, the `text/markdown` media type - none of this is new. Dualmark is a set of conventions on top of RFCs, not a new protocol.

3. **A spec is more useful than a library.** If you're building infrastructure that other teams will reimplement, write the contract first. The library becomes the reference implementation, not the only implementation.

4. **Conformance levels beat all-or-nothing.** Three tiers (Basic / Standard / Advanced) give teams a target they can actually hit in a sprint, plus a path to keep improving.

5. **A verifier is mandatory, not nice-to-have.** Without `dualmark verify`, the spec is theoretical and the packages are trust-me-bro. Build the testing tool early.

6. **The converters are where opinions live.** The spec is neutral about markdown structure. The converter library is where "what do AI systems actually quote well" gets baked in.

7. **Extract one thing at a time.** Core primitives first, then adapters, then converters, then docs, then examples. Trying to extract everything in parallel ends in a stuck branch.

Get started at [dualmark.dev](https://dualmark.dev). The repo is at [github.com/dodopayments/dualmark](https://github.com/dodopayments/dualmark). Issues, PRs, and "I tried it on $framework and it broke" reports are all welcome.

_We're building payment infrastructure at Dodo Payments. If open-source infrastructure, AEO, and the edge sound interesting, we're hiring._
---
- [More Open Source articles](https://dodopayments.com/blogs/category/open-source)
- [All articles](https://dodopayments.com/blogs)