Home / Blog / Why 14,000 winery URLs aren't in the sitemap

May 11, 2026Β· Wine World Map

Why 14,000 winery URLs aren't in the sitemap

The sitemap at wineworldmap.com/sitemap.xml lists every region, every district, every country, every grape variety, and every blog post. It does not list the 14,000-something individual winery pages, even though they exist, are server-rendered, and are fully indexable.

This was a deliberate choice and the comment in app/sitemap.ts explains why:

// Wineries are intentionally NOT listed here. Publishing all 14 k winery
// slugs caused crawlers to fetch every single page on first discovery
// (~150 KB each => >2 GB per pass per bot), spiking Vercel egress. Bots
// still reach individual /wineries/[id] pages by following internal
// links from region detail pages β€” they just do it at a saner pace.

It's worth unpacking what actually happened.

The day the bandwidth bill almost happened

We launched winery pages on a Friday. The sitemap was updated to include them. Sunday morning the Vercel dashboard showed roughly 1.8 GB of egress for the previous day, against a normal baseline of 40-80 MB. Nothing else had changed.

The culprit was a perfectly well-behaved Googlebot crawl. When you publish a sitemap, you're telling search engines here is every URL worth knowing about. Googlebot β€” and Bingbot, and DuckDuckBot, and the long tail of niche crawlers β€” read that as a to-do list and work through it.

Each winery page is about 150 KB of HTML (server-rendered, includes a small mini-map, ratings, contact info, and the full description). 14k pages Γ— 150 KB β‰ˆ 2.1 GB per full crawl per bot. There are at least four bots that crawl us regularly.

The fix is one line of restraint

Just don't list wineries in sitemap(). They still exist β€” every region page links to its wineries, every district page links to its wineries, every country page lists them. Crawlers find them by following links, which is roughly how PageRank was supposed to work in the first place.

The difference between discoverable and announced is the difference between "the bot picks up new winery pages over the course of a week as it re-crawls region detail pages" and "the bot tries to fetch every winery page in the next two hours."

// What we removed
const wineryPages = wineryRows.map(w => ({
  url: `${base}/wineries/${w.id}`,
  // ...
}))

What changed in practice

| Metric | Before | After | |---|---:|---:| | Daily egress (rolling 7-day avg) | 1.8 GB | 65 MB | | Winery pages indexed by Google (after 30 days) | 14,200 | 13,800 | | First-discovery latency for a new winery | hours | days |

Indexed coverage barely moved. The crawler still finds the pages; it just finds them at the speed of internal-link discovery, which is plenty fast for a site where wineries aren't trending news.

When you'd want them in the sitemap

If wineries had time-sensitive content (a tasting event tonight!, limited release wines!), you'd want them in the sitemap so the crawl catches them quickly. Ours don't. The contact info changes once a quarter. The opening hours change twice a year. Crawl latency of "a few days" is fine.

The general rule: the sitemap is a budget, not a manifest. Spend it on pages where you actually need crawl freshness. For the long tail, let the internal link graph do its job.

#seo#sitemap#egress