---
title: "Astro SEO Checklist 2026: 20 tactics ranked by impact"
publishDate: 2026-04-25T00:00:00.000Z
excerpt: "Astro SEO checklist for 2026: 20 tactics ranked from biggest to smallest impact, including canonical URLs, title tag rules, JSON-LD structured data, Person and BreadcrumbList schema, llms.txt, Pagefind search, and a Zod schema that caught 10 bugs in my podcast frontmatter."
category: "seo"
tags: ["seo", "astro", "schema-org", "pagefind", "performance"]
canonical: https://neciudan.dev/astro-seo-checklist-2026
---

SEO comes down to about 20 things, ranked roughly by how much each one moves the needle on a small blog. 

I had a couple of audit docs sitting in `docs/` from February 2025. I'd shipped the recommendations, ticked the boxes, and stopped thinking about it. Then someone left a comment on [my last post](/how-i-cut-250gb-of-bandwidth-from-my-website) asking how I handle SEO, and I figured I'd reread my own notes.

They were 14 months stale.

Half the AI crawlers in my `robots.txt` didn't exist when I wrote that file. PerplexityBot wasn't a thing yet. [`llms.txt`](https://llmstxt.org) wasn't a convention yet. Pagefind was at v0-something.

So I did the audit again, shipped 17 commits over a weekend, and ranked the whole list.

## What this article assumes

You know HTML, JavaScript, and a bit of Astro. You haven't done structured SEO before. By the end, you'll have rich results in Google, working social cards on LinkedIn and Slack, site search, and a JSON-LD setup that future-you doesn't have to revisit.

Three terms I'll use a lot, defined upfront so the rest reads cleanly:

- **Rich result**: anything Google shows in the search results that isn't just a blue link with a description. Star ratings, breadcrumbs, FAQ accordions, recipe cards. They get more clicks. You unlock them with structured data.
- **Featured snippet**: the answer box at the top of Google for question queries ("how do I add a sitemap to Astro?"). It quotes a paragraph or list directly from your page.
- **JSON-LD**: a small `<script type="application/ld+json">` block in your HTML head that describes the page to search engines in machine-readable form. Google reads it; humans don't see it. This is how you tell Google "this is an article, by this author, published on this date, about this topic."

Here's the order, by impact:

1. Canonical URLs on every page
2. Title tag rules (50 to 60 chars, keyword first)
3. Article (or BlogPosting) JSON-LD with Person, image, validation
4. Per-post Open Graph and Twitter cards
5. Unique meta description on every key page
6. One `<h1>` per page, logical heading order
7. Sitemap and robots.txt with the 2026 AI crawler list
8. Astro `<Image>` for everything in `src/assets/`, with proper `alt`
9. First paragraph as definition or outcome
10. Internal linking with descriptive anchor text
11. `BreadcrumbList` schema
12. URL structure and 301 redirects when slugs change
13. `llms.txt` plus a build-time `llms-full.txt`
14. Pagefind for site search
15. `HowTo` schema for tutorial-style posts
16. `Speakable` JSON-LD on Article schema
17. Content collection schemas (the one that found 10 bugs)
18. `dateModified`, shown to readers when distinct
19. `rel="prev"` / `rel="next"` and noindex on pagination
20. `noindex` the 404 page

## Is Astro good for SEO?

Yes, and it's one of the better options for content sites in 2026. Astro ships static HTML by default, has first-class image optimization, sitemap and RSS integrations, content collections with schema validation, and lets you inject any structured data you want via `<Fragment slot="head">` (Astro's syntax for pushing markup into a parent layout's `<head>`).

That's also the catch. Astro doesn't auto-generate canonical URLs, JSON-LD, or meta descriptions. You wire them up. Once.

The 20-item list below is what "wired up" looks like.

## 1. Canonical URLs on every page

This is the biggest win for the smallest amount of code.

If a page is reachable at more than one URL (with or without trailing slash, with query params, paginated, tagged, categorised), engines treat them as duplicates and split your **ranking signal** (the strength Google assigns your page based on links, content, freshness, and so on) across all of them. A canonical URL collapses them into a single URL. Google's [canonicalization docs](https://developers.google.com/search/docs/crawling-indexing/canonicalization) are worth a 5-minute read if you've never set this up before.

```astro
---
import { getCanonical } from '~/helpers/permalinks';
const canonical = getCanonical(Astro.url.pathname);
---
<link rel="canonical" href={canonical} />
```

The `getCanonical` helper is just a function that joins your site URL (from `astro.config.mjs`) with the current path and normalizes trailing slashes. If you don't have one, write a 5-line version.

Every blog post, takeaway page, podcast page, and category page on my site emits one. Pagination uses the paginated URL as the canonical URL, so page 2 doesn't compete with page 1.

Make sure your `<link rel="canonical">` and your `og:url` (the URL you ship in your Open Graph meta tag, used by social platforms) agree. Mismatched canonicals are a real footgun: Open Graph platforms cache one URL while engines index another, and your share counts split across both.

If you do nothing else from this list, do this.

## 2. Title tag rules

The single biggest CTR (click-through rate) lever you have.

The rules:

- 50 to 60 characters (Google truncates around 580px width in SERPs, not at a fixed character count, but characters are a decent proxy)
- Primary keyword as close to the front as you can stand
- Brand suffix optional and at the end ("Title | Brand")
- Don't repeat your H1 word-for-word: the title tag is for the search results page, the H1 is for the page itself, and you can usually make the title tag punchier than the H1
- Each page's title must be unique across the site

Bad title:

```
"Welcome to my blog."
```

Good title:

```
"Astro SEO Checklist 2026: 20 tactics ranked by impact"
```

The second one tells you what's in it, when it's available, and how it's organized. The first one tells you nothing.

This is also the title of this article, by the way. I picked it on purpose, after writing this section.

## 3. Article (or BlogPosting) JSON-LD with Person, image, and validation

[Schema.org](https://schema.org) is a vocabulary maintained by Google, Microsoft, Yahoo, and Yandex. It defines types like `Article`, `Person`, `Organization`, `Course`, and `PodcastSeries`. You write data using these types and embed it in your page as JSON-LD. Search engines read it and decide which rich results you're eligible for.

For a blog, the minimum is `Article` or `BlogPosting`. Google's [Article structured data guide](https://developers.google.com/search/docs/appearance/structured-data/article) lists the exact fields they reward.

Here's the helper I run on every post:

```typescript
export function getArticleSchema(post) {
  return {
    '@context': 'https://schema.org',
    '@type': 'Article',
    headline: post.title,
    description: post.excerpt,
    image: {
      '@type': 'ImageObject',
      url: post.image,
      width: 1200,
      height: 630,
    },
    datePublished: post.publishDate,
    dateModified: post.updateDate ?? post.publishDate,
    author: getPersonSchema(),
    publisher: getPublisherOrganization(),
    mainEntityOfPage: post.canonical,
    wordCount: post.wordCount,
  };
}
```

What that turns into when the page is built:

```html
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Astro SEO Checklist 2026: 20 tactics ranked by impact",
  "description": "Astro SEO checklist for 2026...",
  "image": {
    "@type": "ImageObject",
    "url": "https://neciudan.dev/images/articles/astro-seo.png",
    "width": 1200,
    "height": 630
  },
  "datePublished": "2026-04-25",
  "dateModified": "2026-04-25",
  "author": { "@type": "Person", "name": "Neciu Dan", "...": "..." },
  "publisher": { "@type": "Organization", "name": "Neciu Dan", "...": "..." },
  "mainEntityOfPage": "https://neciudan.dev/astro-seo-checklist-2026"
}
</script>
```

That `<script>` block goes inside your `<head>`. In Astro, the cleanest way is via `<Fragment slot="head">` from a page or component:

```astro
---
import Layout from '~/layouts/PageLayout.astro';
import { getArticleSchema } from '~/helpers/schema';

const articleSchema = getArticleSchema(post);
---
<Layout metadata={metadata}>
  <Fragment slot="head">
    <script
      type="application/ld+json"
      set:html={JSON.stringify(articleSchema)}
    />
  </Fragment>
  <!-- page content -->
</Layout>
```

A few things that bite people:

The `image` field, as a plain string, was enough. It isn't anymore. Google's article rich results require `ImageObject` with explicit `width` and `height`. If you ship a bare string, you lose eligibility. 

Same for `og:image` (use `og:image:width` and `og:image:height` siblings, more on those in section #4).

The `Person` schema for the author is where E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness, Google's framework for "is this author credible?") lives:

```typescript
export function getPersonSchema() {
  return {
    '@type': 'Person',
    name: 'Neciu Dan',
    url: 'https://neciudan.dev/about',
    image: 'https://neciudan.dev/images/dan-portrait.jpg',
    jobTitle: 'Staff Engineer',
    worksFor: { '@type': 'Organization', name: 'Rover' },
    description: 'Frontend engineer, podcast host, conference speaker.',
    sameAs: [
      'https://twitter.com/neciudan',
      'https://www.linkedin.com/in/neciudan',
      'https://github.com/neciudan',
      'https://www.youtube.com/@neciudan',
    ],
  };
}
```

The `image` here should be a real photo of the author, not a logo. Google's knowledge graph (the box that pops up on the right of search results when you search for a person or topic) uses this image when stitching together your byline.

The `sameAs` array is what links your authored articles to your real-world identity across platforms. Skip this, and your bylines float free of any actual person. 

If you publish on Mastodon or other IndieWeb-friendly platforms, you can also add a `<link rel="me" href="...">` to your About page for verified authorship.

You'll also want a `Publisher` (Organization) schema:

```typescript
export function getPublisherOrganization() {
  return {
    '@type': 'Organization',
    name: 'Neciu Dan',
    url: 'https://neciudan.dev',
    logo: {
      '@type': 'ImageObject',
      url: 'https://neciudan.dev/images/logo.png',
      width: 600,
      height: 60,
    },
  };
}
```

For a personal site, the publisher is "you the brand" rather than "you the human."

Validate everything you emit. Two free tools:

- [Google Rich Results Test](https://search.google.com/test/rich-results): paste your live URL and it tells you exactly which rich results you're eligible for, plus any errors
- [Schema.org Validator](https://validator.schema.org/): generic schema.org compliance check, useful when Google's tool doesn't support a type

Run both before declaring victory. I have caught typos and missing required fields with each of them at different times.

I keep all the JSON-LD generators in `src/helpers/schema.ts` and inject them via `<Fragment slot="head">` in the layout.

## 4. Per-post Open Graph and Twitter cards

Open Graph is a meta-tag protocol invented by Facebook in 2010, now used by every major platform (LinkedIn, X, Slack, Discord, iMessage) to render link previews. Twitter cards are X's slightly extended variant.

This isn't strictly SEO. It's how your link looks when someone pastes it. But "looks good when shared" is a click-through multiplier, and click-through is a ranking signal.

Standard fields:

```typescript
const metadata = {
  title,
  description: excerpt,
  canonical: url,
  openGraph: {
    type: 'article',
    url,
    locale: 'en_US',
    images: imageUrl ? [{ url: imageUrl, width: 1200, height: 630 }] : undefined,
    siteName: 'Neciu Dan',
  },
  twitter: {
    cardType: imageUrl ? 'summary_large_image' : 'summary',
    site: '@neciudan',
    creator: '@neciudan',
  },
};
```

What that turns into in the HTML head:

```html
<meta property="og:title" content="..." />
<meta property="og:description" content="..." />
<meta property="og:url" content="..." />
<meta property="og:type" content="article" />
<meta property="og:locale" content="en_US" />
<meta property="og:image" content="..." />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta property="og:site_name" content="Neciu Dan" />

<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@neciudan" />
<meta name="twitter:creator" content="@neciudan" />
```

Default to the post hero image. If there isn't one, fall back to a site-default OG image (set once in your config). Don't ship a post that previews as a blank rectangle.

While you're in here, emit the Open Graph article extensions on blog posts:

```html
<meta property="article:published_time" content="2026-04-25T00:00:00.000Z" />
<meta property="article:modified_time" content="2026-04-25T00:00:00.000Z" />
<meta property="article:author" content="https://neciudan.dev/about" />
<meta property="article:section" content="SEO" />
<meta property="article:tag" content="Astro" />
<meta property="article:tag" content="SEO" />
```

LinkedIn, Slack, and a few others read these. The cost of adding them once in your layout is zero.

## 5. Unique meta description on every key page

The biggest mistake I'd been making for a while: my homepage `meta description` was just falling through to the long site-wide description from `config.yaml`. Same for `/blog`. Same for `/about`.

A meta description is the snippet under your blue link in Google search results. Each of those pages should have a unique, factual sentence that mentions the topic and the source, usually 150 to 160 characters.

Bad:

```
"Personal website of a software engineer based in Barcelona who works on..."
```

Good:

```
"JavaScript and frontend articles by Neciu Dan: React, testing, security, career.
Practical insights for working developers."
```

One catch: Google rewrites about 70% of meta descriptions on the fly to better match the user's query. You're not writing the final SERP snippet; you're writing the default fallback. 

Still worth doing well, because that's what shows on social sites and in Bing.

## 6. One `<h1>` per page, logical heading order

Modern HTML5 actually allows multiple `<h1>` elements within sectioning elements, and Google has [explicitly said](https://www.youtube.com/watch?v=zyqJJXWk0gk) that either approach is fine. Pick whichever you like.

That said, I still ship one `<h1>` per page on this site. The outline reads cleaner, and it's harder to break by accident.

I had two `<h1>`s on my podcast hub for months without noticing. The second one was for "What is Señors @ Scale?" which used to be a section header. I added a `headingLevel="h1" | "h2"` prop to my `Headline` component, and now it's `h2`.

Sections are `h2`. Subsections are `h3`. 

The outline should read like a table of contents.

## 7. Sitemap and robots.txt with the 2026 AI crawler list

A sitemap is an XML file at the root of your domain listing every URL you want indexed. `robots.txt` is a plain-text file that tells crawlers (Googlebot, Bingbot, GPTBot, etc.) which paths they can and can't visit.

Both are one-line setups in Astro using the [`@astrojs/sitemap`](https://docs.astro.build/en/guides/integrations-guide/sitemap/) integration:

```typescript
// astro.config.mjs
import sitemap from '@astrojs/sitemap';

export default defineConfig({
  site: 'https://neciudan.dev',
  integrations: [sitemap()],
});
```

Then add `Sitemap: https://neciudan.dev/sitemap-index.xml` to `public/robots.txt`. Done.

While you're in `robots.txt`, allow the AI crawlers. The 2025 list (`GPTBot`, `ChatGPT-User`, `anthropic-ai`, `Claude-Web`, `Google-Extended`, `CCBot`, `Bard`, `AI2Bot`) is incomplete in 2026.

Add these:

```txt
User-agent: PerplexityBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Bytespider
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Diffbot
Allow: /
```

Crawlers default to allowed unless you disallow them. Adding explicit `User-agent: PerplexityBot` plus `Allow: /` doesn't unlock anything that wasn't already allowed by `User-agent: *` plus `Allow: /`.

Some bots check for explicit allowlists before crawling, but most don't. Treat these blocks as documentation of intent rather than a magic switch.

One thing to note: `robots.txt` `Disallow` and `<meta name="robots" content="noindex">` are different mechanisms. `Disallow` blocks crawling entirely (the crawler never visits the URL, so it never sees a `noindex` either). 

If you actually want a page out of the index, use `noindex` and let the crawler in. If you `Disallow` a page that's already indexed, it stays indexed, and you've just blinded yourself.

## 8. Astro `<Image>` for everything in `src/assets/`, with proper `alt`

[Core Web Vitals](https://web.dev/articles/vitals) are Google's three page-experience metrics: **LCP** (Largest Contentful Paint, how long until the biggest visible element renders), **CLS** (Cumulative Layout Shift, how much things jump around as the page loads), and **INP** (Interaction to Next Paint, how quickly the page responds to taps and clicks). INP replaced FID (First Input Delay) as a Core Web Vital in March 2024.

LCP is usually an image. CLS is usually an image without `width` or `height` attributes. Image-heavy pages with layout thrash hurt INP too.

[Astro's `<Image>` component](https://docs.astro.build/en/guides/images/) from `astro:assets` handles the image side: WebP conversion at build time, srcset generation (multiple resolutions for different screen sizes), sizes, lazy loading, async decoding, content-hashed filenames safe for `immutable` caching. I [wrote about this in detail](/how-i-cut-250gb-of-bandwidth-from-my-website) when I was bleeding bandwidth.

The short version: anything you import from `src/assets/` goes through the pipeline. Anything in `public/` ships byte-for-byte. 

Hero images and post images should live in `src/assets/`. Always.

The `alt` attribute matters as much as the optimization:

- Describe what's in the image factually.
- Don't start with "image of" or "picture of." Screen readers announce "image" already.
- Include the relevant keyword if it fits naturally. Don't stuff it.
- Empty `alt=""` is correct for purely decorative images. Missing `alt` is not.
- Keep it under ~125 characters.

```astro
<Image
  src={hero}
  alt="Network tab showing 6.3MB hero video downloaded on every visit."
  width={1200}
  height={630}
/>
```

## 9. First paragraph as a definition or outcome

For posts that answer a specific question ("what is X", "how to Y"), put a one or two-sentence definition or outcome in the first paragraph. 

Featured snippets and AI Overviews (the ChatGPT-style answer block Google sometimes shows above the regular results) lift the first paragraph as the answer.

Bad first paragraph:

> "Hey everyone, I've been thinking about dynamic programming lately, and I wanted to share some thoughts."

Good first paragraph:

> "Dynamic programming is a method for solving problems by breaking them down into smaller subproblems and storing the solutions so they can be reused instead of recomputed."

The second one is a quotable answer.

Since 2020, Google's [passage indexing](https://blog.google/products/search/search-language-understanding-bert/) (the practice of ranking individual paragraphs of long articles for specific queries, instead of just ranking the page as a whole) means any paragraph in a long article can become a featured snippet, not just the first one. 

So the first-paragraph rule is a strong default, not the only spot where featured-snippet eligibility lives. Section intros, FAQ answers, and bulleted definitions all qualify.

## 10. Internal linking with descriptive anchor text

Anchor text is the visible text inside an `<a>` tag. Engines learn what a page is about partly from the anchor text other pages use to link to it. If every link to my React post says "click here," Google has nothing to work with. 

If half the links say "10 React patterns" and the other half say "common React mistakes," that's a signal.

Two practical things:

- Posts in the same series link to each other inline, not just from a "related posts" footer.
- The link text says what the linked thing is, not "click here" or "in this article."

A few extras for the file:

- Vary your anchor text per target page. Engines distrust exact-match repetition (it looks like manipulation).
- Don't be afraid to link out to authoritative sources (Wikipedia, .edu, .gov, framework docs). Outbound links to authority signal a well-researched piece. Old SEO advice said to hoard them; that's dead.
- The category and tag system is a topical cluster play. Topical clusters are groups of related content under a shared category that engines treat as evidence of topic authority. "SEO," "Astro," and "Performance" are useful categories. "Software Development" is too generic to cluster anything.

## 11. `BreadcrumbList` schema

[`BreadcrumbList`](https://schema.org/BreadcrumbList) is one of the higher-impact rich results. Google replaces the URL in the SERP with the breadcrumb hierarchy ("Home › Blog › SEO › Astro SEO Checklist"), which is way more clickable.

```typescript
{
  '@context': 'https://schema.org',
  '@type': 'BreadcrumbList',
  itemListElement: [
    {
      '@type': 'ListItem',
      position: 1,
      name: 'Home',
      item: 'https://neciudan.dev/',
    },
    {
      '@type': 'ListItem',
      position: 2,
      name: 'Blog',
      item: 'https://neciudan.dev/blog',
    },
    {
      '@type': 'ListItem',
      position: 3,
      name: 'Astro SEO Checklist 2026',
    },
  ],
}
```

The last item omits `item` (it's the current page). Pair the schema with a visible breadcrumb component at the top of each post so users see what the SERP shows.

## 12. URL structure and 301 redirects when slugs change

Short URLs are easier to share and remember, and they don't get truncated in SERPs. Kebab-case, lowercase, no underscores, no `.html` extensions, no trailing dates unless they're meaningfully part of the topic.

`/astro-seo-checklist-2026` is good. `/2026/04/25/My_Astro_SEO_Playbook_Ranked.html` is bad.

When you rename a slug (which you will, eventually), set up a 301 redirect. 

A 301 is the HTTP status code for "permanently moved" and tells engines to update their index to the new URL, transferring the old URL's ranking equity to the new one. A 302 ("temporarily moved") tells engines NOT to update their index and is rarely what you mean.

On Netlify, drop into `public/_redirects`:

```txt
/my-astro-seo-playbook-ranked /astro-seo-checklist-2026 301
```

Or use Astro's built-in [redirect config](https://docs.astro.build/en/reference/configuration-reference/#redirects):

```typescript
// astro.config.mjs
export default defineConfig({
  redirects: {
    '/my-astro-seo-playbook-ranked': '/astro-seo-checklist-2026',
  },
});
```

Without the redirect, old backlinks 404, and you lose the equity.

## 13. `llms.txt` plus a build-time `llms-full.txt`

[`llms.txt`](https://llmstxt.org) is the new robots.txt for AI models. Static markdown at the root of your domain. Looks like this:

```markdown
# Neciu Dan

> Personal site of Neciu Dan, Staff Engineer, host of the Señors @ Scale
> podcast, ReactJS Barcelona organizer, international speaker.

## Blog
- [Blog index](https://neciudan.dev/blog): JavaScript, React, testing, security
- [RSS feed](https://neciudan.dev/rss.xml)

## Podcast
- [Podcast hub](/senors-at-scale): Senior engineers on scale, performance, frontend
- [Episode takeaways](/takeaways)
```

`llms-full.txt` is the same idea, but generated at build time, with all your blog content concatenated. Makes it cheap for an AI doing retrieval to fetch everything in a single fetch.

Astro route at `src/pages/llms-full.txt.ts`:

```typescript
import { getCollection } from 'astro:content';
import { fetchPosts } from '~/helpers/blog';

export const GET = async () => {
  const posts = await fetchPosts();
  const podcasts = await getCollection('podcast');
  // ...build markdown sections from each
  return new Response(text, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
};
```

ChatGPT, Claude, and Perplexity are documented to look for these files. Whether they always honor them is uncertain.

## 14. Pagefind for site search

Not strictly SEO. But "user can find old content" is what makes a site sticky, and stickiness is a ranking signal.

I had no search for years. 30+ blog posts, 30+ podcast takeaways, and the only navigation was the homepage list and an archive page. 

(I built one this weekend, you can [try it now](/search).)

### Quickstart, in order

1. `npm install --save-dev pagefind`
2. Add `"postbuild": "pagefind --site dist"` to `package.json` scripts
3. Mark indexable content with `data-pagefind-body` (see below)
4. Create a `/search` page that loads the Pagefind UI (see below)
5. Run `npm run build`. Pagefind generates `dist/pagefind/` automatically

### Pagefind vs Algolia vs Lunr vs Fuse

The four main options for site search on a static Astro blog:

- **Algolia**: cloud-hosted, fast, costs money once you have real traffic, requires backend sync
- **Lunr.js**: build-time, ships the entire index up front, fine for fewer than ~100 docs
- **Fuse.js**: client-side fuzzy matching, similar tradeoff to Lunr
- **[Pagefind](https://pagefind.app)**: build-time, fragmented index loaded on demand, scales to thousands of pages without changing strategy

Pagefind crawls your build output (`dist/`) after Astro is done. It writes a fragmented index. The browser only loads index chunks for the words you actually search. Initial JS payload is around 40KB.

By default, Pagefind indexes everything within the `<body>` tag. That includes navigation, footer, and related posts.

Mark your actual content with `data-pagefind-body`:

```astro
<article data-pagefind-body>
  <meta data-pagefind-filter="type:blog" />
  {post.category && <meta data-pagefind-filter={`category:${post.category}`} />}
  {post.tags?.map((tag) => <meta data-pagefind-filter={`tag:${tag}`} />)}
  {/* post content */}
</article>
```

The `<meta>` tags are non-rendering. Pagefind reads the attributes during indexing.

The search page (`/search`) is a div and a script:

```astro
<link href="/pagefind/pagefind-ui.css" rel="stylesheet" />
<div id="search"></div>

<script is:inline>
  import('/pagefind/pagefind-ui.js').then(({ PagefindUI }) => {
    new PagefindUI({ element: '#search', showSubResults: true });
  });
</script>
```

`is:inline` is the part that took me a minute to figure out. Without it, Astro's build (Vite) tries to resolve `/pagefind/pagefind-ui.js` at build time, fails (because Pagefind hasn't run yet), and crashes the build.

`is:inline` tells Astro to skip Vite processing on that script tag and emit it verbatim into the HTML. The dynamic `import()` runs in the browser, where the file does exist.

Pagefind only indexes pages that are compiled to static HTML in `dist/`. SSR-only routes (Astro hybrid mode, no `prerender = true`) get skipped. For a typical content site, this is fine.

First build after wiring it up indexed 56 pages and 7,192 words.

Bonus: now that you have search functionality, add a [`SearchAction`](https://schema.org/SearchAction) to your `WebSite` JSON-LD. This is what unlocks the Google sitelinks search box on SERPs (the small search input that appears under your site's main result):

```typescript
{
  '@context': 'https://schema.org',
  '@type': 'WebSite',
  url: 'https://neciudan.dev',
  potentialAction: {
    '@type': 'SearchAction',
    target: 'https://neciudan.dev/search?q={search_term_string}',
    'query-input': 'required name=search_term_string',
  },
}
```

You'll need to wire `?q=` query parsing in your search page for this to actually work end-to-end. Pagefind's UI doesn't read URL params by default.

## 15. `HowTo` schema for tutorial-style posts

If a post walks the reader through sequential steps ("how to add Pagefind to Astro," "how I cut bandwidth"), [`HowTo`](https://schema.org/HowTo) schema marks the structure for Google. 

Tutorial posts with `HowTo` schema get step indicators in SERPs and occasional carousel placement.

You need at least 3 steps for Google to render the rich result, and ideally 4 or more.

```typescript
{
  '@context': 'https://schema.org',
  '@type': 'HowTo',
  name: 'How to add Pagefind site search to Astro',
  totalTime: 'PT1H',
  step: [
    {
      '@type': 'HowToStep',
      position: 1,
      name: 'Install Pagefind',
      text: 'Run npm install --save-dev pagefind in your Astro project root.',
    },
    {
      '@type': 'HowToStep',
      position: 2,
      name: 'Wire the postbuild script',
      text: 'Add "postbuild": "pagefind --site dist" to package.json scripts.',
    },
    {
      '@type': 'HowToStep',
      position: 3,
      name: 'Mark indexable content',
      text: 'Add data-pagefind-body to the outer article element on each post template.',
    },
    {
      '@type': 'HowToStep',
      position: 4,
      name: 'Build the search page',
      text: 'Create src/pages/search.astro that loads Pagefind UI from /pagefind/pagefind-ui.js.',
    },
  ],
}
```

A post can carry both `Article` and `HowTo` JSON-LD. Google reads both. Your posts on bandwidth optimization, building pipelines, and this checklist itself are all candidates.

## 16. `Speakable` JSON-LD on Article schema

For voice answer engines (Google Assistant, Alexa). One property in your existing Article schema, see [`SpeakableSpecification`](https://schema.org/SpeakableSpecification):

```typescript
speakable: {
  '@type': 'SpeakableSpecification',
  cssSelector: ['h1', '[data-speakable]'],
}
```

The `[data-speakable]` selector is a forward-looking hook. If you write a TL;DR in a future post and mark it `data-speakable`, voice engines will read it as the summary. Until then, the `h1` covers the headline.

Engines that don't care about Speakable ignore the property. There's no downside.

## 17. Content collection schemas (the one that found 10 bugs)

This is the one that surprised me.

I have a podcast section on the site ([Señors @ Scale](/senors-at-scale), 30 episodes). Each episode is a markdown file in `src/content/podcast/`. Standard Astro [content collection](https://docs.astro.build/en/guides/content-collections/) (the framework's typed-frontmatter system, where you define a schema once and Astro validates every file against it). Except I'd never declared the collection in `config.ts`.

It worked anyway. `getCollection('podcast')` was permissive. No schema, no validation, just whatever happened to be in the frontmatter. For a year and a half,, I'd been adding episodes by copying and pasting an existing one and editing the fields.

You can guess where this is going.

I wrote a Zod schema for the collection. (Zod is a TypeScript-first validation library; you describe the shape of an object, and it tells you whether real data matches.) 

Then, before declaring the collection, I wrote a Vitest test that ran the schema against every episode's frontmatter.

Prerequisites if you want to follow along:

```bash
npm install --save-dev vitest gray-matter zod
```

The test goes at `tests/podcast-collection.test.ts` and runs with `npx vitest run` or `npm test` if you've wired it up.

```typescript
import { describe, it, expect } from 'vitest';
import matter from 'gray-matter';
import { z } from 'zod';

const podcastSchema = z.object({
  title: z.string(),
  episodeNumber: z.number(),
  guest: z.string(),
  description: z.string(),
  spotifyUrl: z.string().url().optional(),
  youtubeUrl: z.string().url().optional(),
  appleUrl: z.string().url().optional(),
  // ...
});

for (const file of episodeFiles) {
  it(`${file} matches the podcast schema`, () => {
    const { data } = matter(readFileSync(file, 'utf8'));
    const result = podcastSchema.safeParse(data);
    expect(result.success).toBe(true);
  });
}
```

`safeParse` returns `{ success: true, data }` on a match or `{ success: false, error }` on a mismatch. Unlike `parse`, it doesn't throw, which is what you want when iterating over many files.

I ran it. It failed on 10 episodes.

Three had URL fields wrapped in markdown link syntax: `spotifyUrl: "[https://...](https://...)"`. Notion auto-formatting that I'd pasted in months ago and never looked at again.

Seven had `appleUrl: ""`. I had a habit of leaving the field blank when an episode hadn't yet been published on Apple Podcasts, intending to fill it in later. 

I never filled it in. The Zod URL validator caught all seven instantly.

For 18 months, I'd been linking to broken Spotify URLs from my own episode pages.

While you're declaring podcast schemas, also emit [`PodcastSeries`](https://schema.org/PodcastSeries) on the podcast hub and [`PodcastEpisode`](https://schema.org/PodcastEpisode) on each episode/takeaway page. Podcast schemas are how Google's podcast surfaces (Search, Assistant) discover episodes.

Calling all of this an SEO tip is a stretch. But broken outbound links from your domain are something engines notice, and content collection schemas automatically catch them. Worth doing.

## 18. `dateModified`, shown to readers when distinct

Freshness is a ranking signal. `dateModified` already lives in your JSON-LD if you wire `updateDate` into `getArticleSchema`.

Show it to readers visually too, so they trust the article isn't stale:

```astro
{post.updateDate && post.publishDate &&
  new Date(post.updateDate).toDateString() !== new Date(post.publishDate).toDateString() && (
  <p class="text-sm text-muted mt-1">
    Updated <time datetime={post.updateDate.toISOString()}>
      {post.updateDate.toLocaleDateString('en-US', {
        year: 'numeric', month: 'short', day: 'numeric'
      })}
    </time>
  </p>
)}
```

Only show it when the dates actually differ. Otherwise, it's noise.

## 19. `rel="prev"` / `rel="next"` and noindex on pagination

Astro's pagination provides `Astro.props.page.url.prev` and `page.url.next` for free.

```astro
<Layout metadata={metadata}>
  <Fragment slot="head">
    {prevUrl && <link rel="prev" href={prevUrl} />}
    {nextUrl && <link rel="next" href={nextUrl} />}
  </Fragment>
</Layout>
```

Google [deprecated `rel=prev/next` as a ranking signal](https://developers.google.com/search/blog/2019/03/rel-next-prev-experiment) in 2019. Bing and other engines still use it, and it costs you nothing, so I keep it.

For the noindex part, Google's current guidance is mixed. The old advice was "noindex page 2+ of pagination." The newer advice is "make page 2+ self-canonical and let them rank if they're useful." I noindex page 2+ on my blog because they're not useful as landing pages (readers want the post itself, not a list of headlines). Your call.

## 20. `noindex` the 404 page

One-line change. Otherwise, the occasional 404 sneaks into search results, which is a bad experience for everyone:

```astro
<Layout metadata={{
  title: 'Error 404',
  robots: { index: false, follow: false }
}}>
```

Watch for "soft 404s" too. A soft 404 is a page that returns HTTP 200 OK but looks empty or error-shaped to Google ("No results found," "Sorry, this content is unavailable"). Search Console flags these and removes them from the index. If you have empty-state pages, give them real content or return a real 404.

## What's next on my list

Next quarterly audit will pick up:

- Dynamic per-post OG image generation with `@vercel/og` or `satori`. Useful for posts that don't ship with a hero image. The libraries are already installed.
- Visible breadcrumb UI to match the `BreadcrumbList` schema I shipped.
- `FAQPage` JSON-LD for the FAQ section below. The visible FAQ is here; the schema requires extending the post template to read a `faq:` frontmatter array.
- Site-wide font preconnect in the layout.
- [Google Search Console](https://search.google.com/search-console) and [Bing Webmaster Tools](https://www.bing.com/webmasters) verification + indexing requests for the new pages. Without these, you're blind to crawl errors and Core Web Vitals reports.
- [IndexNow](https://www.indexnow.org/) push-indexing for Bing and Yandex. Cheap to wire from a Netlify build hook.

Worth saying out loud: everything on the main list is **on-page SEO** (things you control inside your own pages). **Off-page SEO** (backlinks, brand mentions, real-world authority) is the other half of the picture and doesn't fit in a code-driven checklist. Backlinks are when other sites link to yours, and they're acquired by writing things people want to link to, then asking others to link to them.

## Glossary

A quick reference for the terms used above.

- **Anchor text**: the visible text inside a link.
- **Backlink**: a link from another site to yours. The currency of off-page SEO.
- **Canonical URL**: the "official" URL of a page when multiple URLs serve similar content. Set via `<link rel="canonical">`.
- **CLS** (Cumulative Layout Shift): how much elements jump around as a page loads. Lower is better.
- **Core Web Vitals**: Google's three page-experience metrics. LCP, CLS, INP.
- **Crawler / bot**: software that fetches and indexes web pages (Googlebot, Bingbot, GPTBot, PerplexityBot).
- **E-E-A-T**: Experience, Expertise, Authoritativeness, Trustworthiness. Google's framework for evaluating content credibility.
- **Featured snippet**: the answer box at the top of Google for question queries.
- **INP** (Interaction to Next Paint): how quickly the page responds to taps, clicks, and key presses. Replaced FID in March 2024.
- **JSON-LD**: structured data embedded in HTML as a `<script type="application/ld+json">` block. The standard format for schema.org markup.
- **Knowledge graph**: Google's database of entities (people, places, things) and their relationships. The box on the right of the search results.
- **LCP** (Largest Contentful Paint): how long until the biggest visible element renders. Lower is better.
- **Open Graph**: the meta-tag protocol that controls how links preview on social platforms (`og:title`, `og:image`, etc.).
- **Off-page SEO**: signals from outside your site (backlinks, brand mentions, social shares).
- **On-page SEO**: signals you control inside your own pages (content, schema, internal links, performance).
- **Passage indexing**: Google's practice of ranking individual paragraphs of long articles, not just the page as a whole.
- **Ranking signal**: any factor Google uses to decide your page's order in search results. Hundreds exist; freshness, authority, and user behavior are the big ones.
- **Rich result**: any non-blue-link presentation in Google search (breadcrumbs, FAQ accordions, star ratings, recipe cards).
- **Schema.org**: the vocabulary used to describe entities (Article, Person, Course, etc.) in structured data. Maintained by Google, Microsoft, Yahoo, and Yandex.
- **SERP**: Search Engine Results Page. The list of results that Google or another engine shows for a query.
- **Sitemap**: an XML file at the root of your domain listing every URL you want indexed.
- **301 redirect**: HTTP status code for "permanently moved." Tells engines to update their index and transfer ranking equity.
- **Topical clusters**: groups of related content under a shared category that engines treat as evidence of topic authority.

## Frequently asked questions

### Is Astro SEO-friendly?

Yes. Astro renders static HTML by default, ships a sitemap integration, and lets you inject any `<head>` markup or JSON-LD via `<Fragment slot="head">`. There's no SEO-breaking client-side rendering by default, unlike SPAs. The work is in wiring up canonical URLs, schema, and meta tags, which Astro doesn't auto-generate.

### How do I add a sitemap to an Astro site?

Install `@astrojs/sitemap`, add `sitemap()` to `integrations` in `astro.config.mjs`, set the `site` URL in the same config, and reference `sitemap-index.xml` in your `robots.txt`. Build, and Astro automatically emits a sitemap.

### How do I add canonical URLs in Astro?

There's no built-in helper, but it's a single `<link rel="canonical" href={...} />` in your layout's `<head>`. Compute the canonical from `Astro.url.pathname` plus your `site` URL. Use the canonical URL on paginated, tagged, and category pages, too, so duplicates collapse into a single signal.

### Does Astro support JSON-LD structured data?

Yes. Astro doesn't generate it for you, but you can write any schema.org type as a JSON object and inject it via `<script type="application/ld+json">` inside `<Fragment slot="head">`. Put the schema generators in a single helper file (`src/helpers/schema.ts`) and import them from any page or layout.

### Can I use Pagefind with Astro hybrid mode?

Yes, as long as the pages you want indexed are statically rendered. Pagefind reads `dist/` after the build, so any prerendered HTML gets indexed. SSR-only routes (without `export const prerender = true`) don't end up in `dist/` and won't be searchable. For a typical content site, this is the default, and you don't need to think about it.

### What's the difference between `llms.txt` and `robots.txt`?

`robots.txt` tells crawlers what they can and can't access. `llms.txt` tells AI models what your site is about and where the canonical content lives. They serve different purposes and live alongside each other at the root of your domain.

### What's the difference between `noindex` and `Disallow` in robots.txt?

`Disallow` in `robots.txt` blocks crawling. The crawler never visits the URL, so it never sees the page's `<meta name="robots" content="noindex">` either. If a page is already indexed and you want it out, use `noindex` and let the crawler in. `Disallow` after the fact strands the page in the index.

### How do I 301 redirect old URLs in Astro?

Two options. On Netlify, add a line to `public/_redirects`: `/old /new 301`. Cross-host, use Astro's built-in `redirects` config in `astro.config.mjs` (`redirects: { '/old': '/new' }`). Both ship as proper 301 responses.

## What to actually do today

Two things, then go.

Open your `robots.txt`. If `PerplexityBot` isn't in there, it's already 2026 outside your terminal.

Then paste your homepage URL into the [Google Rich Results Test](https://search.google.com/test/rich-results). If "no items detected" comes back, you're invisible in rich results entirely. Start at the top of this list.

If you want more like this, I write at [the blog](/blog) and host the [Señors @ Scale podcast](/senors-at-scale).

## References

- [Google: Canonicalization](https://developers.google.com/search/docs/crawling-indexing/canonicalization)
- [Google: Article structured data](https://developers.google.com/search/docs/appearance/structured-data/article)
- [Google Rich Results Test](https://search.google.com/test/rich-results)
- [Google Search Console](https://search.google.com/search-console)
- [Google: Passage indexing announcement](https://blog.google/products/search/search-language-understanding-bert/)
- [Google: rel=prev/next deprecation](https://developers.google.com/search/blog/2019/03/rel-next-prev-experiment)
- [Schema.org](https://schema.org)
- [Schema.org Validator](https://validator.schema.org/)
- [Schema.org: Article](https://schema.org/Article)
- [Schema.org: BreadcrumbList](https://schema.org/BreadcrumbList)
- [Schema.org: HowTo](https://schema.org/HowTo)
- [Schema.org: SpeakableSpecification](https://schema.org/SpeakableSpecification)
- [Schema.org: SearchAction](https://schema.org/SearchAction)
- [Schema.org: PodcastSeries](https://schema.org/PodcastSeries)
- [Schema.org: PodcastEpisode](https://schema.org/PodcastEpisode)
- [Astro: Images guide](https://docs.astro.build/en/guides/images/)
- [Astro: Sitemap integration](https://docs.astro.build/en/guides/integrations-guide/sitemap/)
- [Astro: Content collections](https://docs.astro.build/en/guides/content-collections/)
- [Astro: Redirects config](https://docs.astro.build/en/reference/configuration-reference/#redirects)
- [Core Web Vitals](https://web.dev/articles/vitals)
- [Pagefind](https://pagefind.app)
- [llms.txt specification](https://llmstxt.org)
- [Bing Webmaster Tools](https://www.bing.com/webmasters)
- [IndexNow](https://www.indexnow.org/)
