I was auditing the crawl budget on a content site I built with the Next.js App Router, and Google Search Console was throwing a strange signal: dozens of article, coin, and glossary URLs I had deleted long ago were still sitting in the index, and the crawler kept coming back to them. The pages clearly rendered a not-found UI. My instinct was a stale sitemap or a stray internal link. It was neither. The first thing I actually checked was the status code:
curl -sI https://old-site.com/article/an-article-i-deleted | head -n 1What came back stopped me for a second:
HTTP/2 200
Two hundred. Not 404. My not-found page, with all its "page not found" copy, was being served to Google with an HTTP 200. That is the textbook definition of a soft-404: a page that visually says "empty" but protocol-wise says "everything is fine, please index me". Google trusts the status code over the text every time, so those dead URLs would never deindex, and my crawl budget was leaking into ghost pages.
Why this happens
The offending routes were all force-dynamic — rendered through RSC streaming. That is exactly where the trap lives. The moment Next.js starts streaming a response, it commits an HTTP 200 to the headers, because the first bytes have already gone out to the client. HTTP headers cannot be changed once the body starts flowing. That is a protocol rule, not a Next.js policy.
The problem was that notFound() in my dynamic route was called after an await for a data fetch:
export default async function ArticlePage({ params }) {
const article = await getArticle(params.slug)
if (!article) {
notFound() // too late: 200 is already committed
}
return <Article data={article} />
}By the time notFound() finally ran, the stream had already started, the 200 was already sent, and all Next.js could do was render the not-found UI inside a body whose status was already locked to 200. I briefly thought removing loading.tsx would help, since that is usually what kicks off streaming early. It did not. As long as the page itself still awaits data before deciding it is not-found, the stream goes out first. Deleting loading.tsx changed nothing.
So this was not a bug in my code, and it was not a stuck cache. It was an architectural consequence of streaming: the status has to ship before you know the answer is not-found.
The fix
The key insight: if I need a correct status code, I have to decide the status before streaming begins. The only place that runs before render is middleware.ts. Middleware runs at the edge, ahead of any route handler, and it can commit a status once and be done. So for slugs I know are dead, I return HTTP 410 Gone straight from middleware:
// middleware.ts
import { NextResponse } from 'next/server'
import { GONE_ARTICLE_SLUGS } from './lib/gone-article-slugs'
export function middleware(request) {
const slug = request.nextUrl.pathname.split('/').pop() ?? ''
if (GONE_ARTICLE_SLUGS.has(slug)) {
return new NextResponse('Gone', {
status: 410,
headers: { 'Cache-Control': 's-maxage=3600' },
})
}
return NextResponse.next()
}I deliberately use 410 Gone rather than 404, because 410 tells Google the page was intentionally and permanently removed. It deindexes faster and more decisively than a 404.
There is one design decision I hammered into myself here, and it matters: the dead-slug list is static, in lib/gone-article-slugs.ts, not the result of a CMS query. The temptation is huge to write "if sanityFetch returns null, send 410". Don't. Because sanityFetch returns null for two completely different conditions: the article genuinely does not exist, OR there was a transient failure (network, rate limit, a brief CMS outage). If I drove the 410 off null, one network blip could send a 410 for an article that is actually live, and Google would deindex real content. A static list has none of that risk.
For the general not-found route that isn't a known dead slug, I leave it as a protective soft-404. Letting a random page return 200 is far safer than risking the deindexing of something valid because of overly aggressive logic.
The takeaway
In the world of RSC streaming, notFound() after an await will never produce a real 404, because the HTTP status is committed the instant the first byte flows. If you need a correct status code for SEO, decide it before streaming: return 410 (or 404) from middleware.ts for a static, known-dead list of slugs. Never drive that status off a CMS lookup, because an ambiguous null can deindex a live page. And do not trust that your not-found page is correct just because its text is correct; curl -sI and read the first line. A UI that says "not found" on top of an HTTP 200 is not a 404 — it is a false promise to the crawler.
