Dissecting Yomi: Building a 257K-Page Japanese Reference Site Within Cloudflare Workers Free Limits

Yomi Cloudflare Workers SSR Case Study

Yomi currently serves about 257,019 sitemap URLs, but the part I find more interesting is not the page count. It is the fact that the product runs as a serverless, databaseless SSR system shaped around Cloudflare Workers and Pages free-tier limits, with most of the heavy work pushed into the build pipeline instead of the request path.

Numbers in this post were checked on April 23, 2026. Cloudflare platform limits cited here come from the official Workers limits and Pages limits docs.

Why This Problem Was Interesting to Me

A Japanese reference site with hundreds of thousands of URLs usually sounds like a database problem. That is the default expectation. There is search, there are many detail pages, there are canonical routes, and there is enough generated structure that a relational database or a search backend feels inevitable.

Yomi made me question that assumption.

If the goal had only been to publish a large set of static pages, the easiest path would have been much simpler. I could have used a conventional static site generator and hosted the output on GitHub Pages. That part is not especially hard. The hard part is search once the site stops being a small blog and starts behaving like a reference product.

That search problem shows up clearly in the Hugo community too. In this Hugo search discussion, one of the examples is a roughly 60k-page Hugo site whose offline search index had already grown to around 850MB. The main recommendation there was Pagefind, which makes sense for fully static sites, but the thread still illustrates the real pressure point: generating HTML is easy, while shipping usable search at scale is where the architecture becomes painful.

The simplest search answer would have been to let Google handle it after the site was crawled. That can work for many sites, and it would have been the lowest-effort way to avoid building a search system at all. But that was not the point of Yomi. I wanted to explore a different shape: a product where Cloudflare Workers could sit in front of prebuilt data, handle the route logic directly, and behave more like a reference application than a static archive with outsourced search.

What I actually needed was not user state, transactions, dashboards, or arbitrary query composition. I needed a public, read-heavy reference product that could answer a narrow set of lookup jobs quickly and consistently. Once I framed it that way, the architecture changed. Instead of asking how to power a large reference site with a live database, I started asking how much of the work could be done ahead of time so the runtime could stay thin.

That tension is what made the project interesting to me. Yomi looks heavier than it really is. The page count is large, but the runtime responsibilities are intentionally narrow.

What Yomi Actually Ships

Yomi is a Japanese-language reference site for English expression lookup and kanji, available at yomi.septn.com.

Today it ships with one shared shell and two live lanes:

Yomi as the shared shell and routing layer
eigo for Japanese-to-English expression lookup
kanji for kanji reference

That product shape matters because it keeps the scope honest. Yomi is not trying to be a full learning platform, a generic translator, or a database for every possible Japanese language task. The live experience is narrower than that. It is a focused reference surface with SSR routes, canonical detail pages, browse pages, and a section-aware handoff model.

At the time of writing, its sitemap footprint is 257,019 served URLs. The generated side is split into 54 sitemap shards, and the runtime adds 1 trust sitemap shard on top of that. Those URLs are not there to inflate the number. They exist because the product already has enough real browse and detail surfaces to justify a large route graph.

Why I Kept It Serverless and Databaseless

The decision to keep Yomi serverless and databaseless was less ideological than practical. The site is public, mostly anonymous, and overwhelmingly read-heavy. There is no account layer, no per-user state to persist, no editorial CMS inside the request path, and no need for request-time writes during normal usage.

That made build-time transformation a better fit than runtime assembly.

I could take source dictionaries and supporting datasets, shape them into the exact route and lookup structures the product needed, and then let the deployed app read prebuilt artifacts. That removed an entire class of runtime complexity. There was no database to provision, no migration layer to keep in sync, and no query planner sitting behind each page render.

It also changed the cost model of the application. The expensive work moved into build:data. The runtime became mostly responsible for routing, reading a small set of prepared assets, and rendering HTML. That let Yomi solve lookup inside its own route model instead of punting the hard part to an external search engine and hoping the right page had already been crawled.

The Free-Tier Limits That Actually Mattered

Cloudflare’s free tier only becomes a useful constraint when you translate it into design pressure. A list of numbers by itself does not explain much. What mattered for Yomi was how those numbers interacted.

On the Workers Free plan, the big constraints were:

100,000 requests per day
10 ms CPU time per request
3 MB compressed Worker size
50 subrequests per request
20,000 static asset files per Worker version
25 MiB maximum size for each static asset file

On the Pages Free plan, the operational limits that mattered were:

500 builds per month
20 minutes maximum build time
20,000 files per site
25 MiB maximum file size

Yomi’s current deploy shape sits comfortably inside some of those ceilings and much closer to others. The generated output under dist currently contains 13,167 files, which is well below the 20,000 file cap but still high enough that asset layout cannot be sloppy. The runtime bundle is tiny by comparison: dist/_worker.js is 299,110 bytes raw and 71,851 bytes gzip, which is nowhere near the 3 MB compressed Worker limit.

The more important constraint was not bundle size. It was keeping data assets deployable and keeping the runtime cheap enough that the Worker was not doing the heavy lifting.

name = "yomi"
compatibility_date = "2026-04-20"
compatibility_flags = ["nodejs_compat"]
pages_build_output_dir = "./dist"

[limits]
cpu_ms = 10

That cpu_ms = 10 line in wrangler.toml is small, but it says a lot about how little room there is for waste in the request path.

The Runtime Shape

Once I accepted that the runtime had to stay thin, the system shape became straightforward.

Source Data

Dictionaries, snapshots, and supporting metadata

Input

Yomi starts with source dictionaries and archived data snapshots. The product does not invent content at request time. It prepares reference data ahead of time.

Build Pipeline

`build:data` does the expensive shaping

Transform

The data pipeline resolves aliases, generates lookup artifacts, builds browse structures, prepares site metadata, and writes the JSON layout the runtime expects. That is where the heavy parsing and shaping happens.

Sharded Assets

Versioned JSON deploy artifacts

Output

Instead of bundling reference data into the Worker, Yomi writes sharded JSON assets as deploy artifacts. The runtime reads prepared data files rather than shipping bulky reference content inside the Worker bundle itself.

Thin SSR Worker

Route, fetch, parse, render

Runtime

At request time, the Worker mostly routes the request, fetches the prepared asset it needs, parses the JSON, and renders the page. The runtime is intentionally closer to a lookup-and-render shell than a traditional application backend.

That separation matters because it lets the runtime keep a stable logical data layer while the deploy layout stays flexible. The Worker can fetch the prepared artifact it needs without dragging the full reference corpus into the runtime bundle.

How I Kept the Data Deployable

The most important build-time rule in Yomi is simple: if a generated file is too large for Cloudflare, the build should fail before deployment.

That rule became necessary because the reference data is large enough to drift into unsafe territory very easily. Earlier in development, a generated resolver artifact reached 27.7 MiB, which is above Cloudflare’s 25 MiB static asset cap. That is the kind of problem that needs to be caught by the pipeline, not by a failed deploy after the fact.

The builders now guard against that directly:

async function writeJson(pathname: string, value: unknown): Promise<void> {
  const serialized = `${JSON.stringify(value)}\n`
  assertCloudflareStaticAssetSize(pathname, serialized)
  await mkdir(dirname(pathname), { recursive: true })
  await writeFile(pathname, serialized, 'utf8')
}

function assertCloudflareStaticAssetSize(pathname: string, serialized: string): void {
  const size = Buffer.byteLength(serialized, 'utf8')
  if (size <= CLOUDFLARE_STATIC_ASSET_LIMIT_BYTES) {
    return
  }

  throw new Error(`Generated asset exceeds Cloudflare's 25 MiB static-file limit`)
}

That changed the system in two ways.

First, it forced me to shard large lookup structures early instead of waiting until the output became unmanageable. Second, it pushed me to think about file count and file size at the same time. Oversharding would create a different problem. With 13,167 files already in dist, there is enough room under Cloudflare’s 20,000 file cap, but not enough room to treat every tiny variation as its own asset forever.

At the moment, the largest checked deploy asset is about 11.2 MB, which is comfortably below the 25 MiB cap. That is a healthy margin, but it only stays healthy if the build layout is disciplined.

Why Routing and Sitemaps Became Part of the Architecture

Once the data is generated and deployed as static assets, route design stops being a thin presentation concern. It becomes part of the system.

Yomi has to preserve canonical detail routes, keep legacy redirects from creating duplicate content, and distinguish between indexable browse pages and utility pages that should stay noindex. Search pages are useful to users, but they are not the pages I want search engines to treat as the canonical representation of the product.

The sitemap layer reflects that same discipline. Yomi does not emit one giant sitemap. It chunks the URL graph into bounded pieces, which makes the structure easier to validate and keeps the generated metadata aligned with the route model.

export const MAX_URLS_PER_SITEMAP = 5000

export function createSitemapChunks(prefix: string, paths: string[]): SitemapChunk[] {
  const chunks: SitemapChunk[] = []
  for (let index = 0; index < uniquePaths.length; index += MAX_URLS_PER_SITEMAP) {
    chunks.push({
      id: `${prefix}-${String(index / MAX_URLS_PER_SITEMAP + 1).padStart(3, '0')}`,
      paths: uniquePaths.slice(index, index + MAX_URLS_PER_SITEMAP),
    })
  }
  return chunks
}

That is why the current sitemap output ends up as 54 generated shards before the runtime trust shard is added. Sitemap chunking is not just SEO housekeeping here. It is one of the ways the site stays operationally understandable as the route graph grows.

The Trade-offs I Accepted

This architecture is not free.

The biggest trade-off is runtime flexibility. A database-backed system makes it easier to add new query patterns, ad-hoc filters, or editorial surfaces without revisiting the artifact layout. Yomi gives up some of that flexibility in exchange for a thinner request path and a simpler deployment model.

The second trade-off is that more pressure moves into the build pipeline. If the data shape changes, the pipeline has to adapt. If the asset layout is inefficient, the cost shows up at deploy time or in request-time fan-out. The product avoids one class of operational complexity by accepting another.

The third trade-off is that platform limits become product limits. If a route needs too many asset reads, or a lookup path requires heavy parsing at request time, it is no longer just an implementation detail. It is a threat to the runtime budget. The current readJsonAsset() flow already reflects that pressure by keeping a small in-memory cache for a narrow set of production assets. That is useful, but it is also a reminder that request-time fan-out is still the pressure point to watch.

What I Would Revisit Next

If I keep pushing Yomi in this direction, the next improvements are not flashy ones.

I would start by slimming the heaviest indexes and being more deliberate about which assets deserve runtime caching. I would also revisit a few of the larger generated structures to see whether they can be split more intelligently without pushing file counts too high.

After that, I would spend more time profiling the request path itself. The Worker bundle is already small, so bundle size is not the interesting question anymore. The interesting question is where request-time asset fetches, JSON parsing, and route-specific fan-out still create unnecessary pressure inside the 10 ms CPU budget.

That is the part of Yomi I find most worth studying now. The project already proved that a 257K-page Japanese reference site can fit inside Cloudflare Workers and Pages free-tier constraints without a live database. The next step is making that shape even more efficient.

Visit Yomi Return to Posts