--- title: "Astro SEO Checklist 2026: 20 tactics ranked by impact" publishDate: 2026-04-25T00:00:00.000Z excerpt: "Astro SEO checklist for 2026: 20 tactics ranked from biggest to smallest impact, including canonical URLs, title tag rules, JSON-LD structured data, Person and BreadcrumbList schema, llms.txt, Pagefind search, and a Zod schema that caught 10 bugs in my podcast frontmatter." category: "seo" tags: ["seo", "astro", "schema-org", "pagefind", "performance"] canonical: https://neciudan.dev/astro-seo-checklist-2026 --- SEO comes down to about 20 things, ranked roughly by how much each one moves the needle on a small blog. I had a couple of audit docs sitting in `docs/` from February 2025. I'd shipped the recommendations, ticked the boxes, and stopped thinking about it. Then someone left a comment on [my last post](/how-i-cut-250gb-of-bandwidth-from-my-website) asking how I handle SEO, and I figured I'd reread my own notes. They were 14 months stale. Half the AI crawlers in my `robots.txt` didn't exist when I wrote that file. PerplexityBot wasn't a thing yet. [`llms.txt`](https://llmstxt.org) wasn't a convention yet. Pagefind was at v0-something. So I did the audit again, shipped 17 commits over a weekend, and ranked the whole list. ## What this article assumes You know HTML, JavaScript, and a bit of Astro. You haven't done structured SEO before. By the end, you'll have rich results in Google, working social cards on LinkedIn and Slack, site search, and a JSON-LD setup that future-you doesn't have to revisit. Three terms I'll use a lot, defined upfront so the rest reads cleanly: - **Rich result**: anything Google shows in the search results that isn't just a blue link with a description. Star ratings, breadcrumbs, FAQ accordions, recipe cards. They get more clicks. You unlock them with structured data. - **Featured snippet**: the answer box at the top of Google for question queries ("how do I add a sitemap to Astro?"). It quotes a paragraph or list directly from your page. - **JSON-LD**: a small ` ``` That ` ``` `is:inline` is the part that took me a minute to figure out. Without it, Astro's build (Vite) tries to resolve `/pagefind/pagefind-ui.js` at build time, fails (because Pagefind hasn't run yet), and crashes the build. `is:inline` tells Astro to skip Vite processing on that script tag and emit it verbatim into the HTML. The dynamic `import()` runs in the browser, where the file does exist. Pagefind only indexes pages that are compiled to static HTML in `dist/`. SSR-only routes (Astro hybrid mode, no `prerender = true`) get skipped. For a typical content site, this is fine. First build after wiring it up indexed 56 pages and 7,192 words. Bonus: now that you have search functionality, add a [`SearchAction`](https://schema.org/SearchAction) to your `WebSite` JSON-LD. This is what unlocks the Google sitelinks search box on SERPs (the small search input that appears under your site's main result): ```typescript { '@context': 'https://schema.org', '@type': 'WebSite', url: 'https://neciudan.dev', potentialAction: { '@type': 'SearchAction', target: 'https://neciudan.dev/search?q={search_term_string}', 'query-input': 'required name=search_term_string', }, } ``` You'll need to wire `?q=` query parsing in your search page for this to actually work end-to-end. Pagefind's UI doesn't read URL params by default. ## 15. `HowTo` schema for tutorial-style posts If a post walks the reader through sequential steps ("how to add Pagefind to Astro," "how I cut bandwidth"), [`HowTo`](https://schema.org/HowTo) schema marks the structure for Google. Tutorial posts with `HowTo` schema get step indicators in SERPs and occasional carousel placement. You need at least 3 steps for Google to render the rich result, and ideally 4 or more. ```typescript { '@context': 'https://schema.org', '@type': 'HowTo', name: 'How to add Pagefind site search to Astro', totalTime: 'PT1H', step: [ { '@type': 'HowToStep', position: 1, name: 'Install Pagefind', text: 'Run npm install --save-dev pagefind in your Astro project root.', }, { '@type': 'HowToStep', position: 2, name: 'Wire the postbuild script', text: 'Add "postbuild": "pagefind --site dist" to package.json scripts.', }, { '@type': 'HowToStep', position: 3, name: 'Mark indexable content', text: 'Add data-pagefind-body to the outer article element on each post template.', }, { '@type': 'HowToStep', position: 4, name: 'Build the search page', text: 'Create src/pages/search.astro that loads Pagefind UI from /pagefind/pagefind-ui.js.', }, ], } ``` A post can carry both `Article` and `HowTo` JSON-LD. Google reads both. Your posts on bandwidth optimization, building pipelines, and this checklist itself are all candidates. ## 16. `Speakable` JSON-LD on Article schema For voice answer engines (Google Assistant, Alexa). One property in your existing Article schema, see [`SpeakableSpecification`](https://schema.org/SpeakableSpecification): ```typescript speakable: { '@type': 'SpeakableSpecification', cssSelector: ['h1', '[data-speakable]'], } ``` The `[data-speakable]` selector is a forward-looking hook. If you write a TL;DR in a future post and mark it `data-speakable`, voice engines will read it as the summary. Until then, the `h1` covers the headline. Engines that don't care about Speakable ignore the property. There's no downside. ## 17. Content collection schemas (the one that found 10 bugs) This is the one that surprised me. I have a podcast section on the site ([Señors @ Scale](/senors-at-scale), 30 episodes). Each episode is a markdown file in `src/content/podcast/`. Standard Astro [content collection](https://docs.astro.build/en/guides/content-collections/) (the framework's typed-frontmatter system, where you define a schema once and Astro validates every file against it). Except I'd never declared the collection in `config.ts`. It worked anyway. `getCollection('podcast')` was permissive. No schema, no validation, just whatever happened to be in the frontmatter. For a year and a half,, I'd been adding episodes by copying and pasting an existing one and editing the fields. You can guess where this is going. I wrote a Zod schema for the collection. (Zod is a TypeScript-first validation library; you describe the shape of an object, and it tells you whether real data matches.) Then, before declaring the collection, I wrote a Vitest test that ran the schema against every episode's frontmatter. Prerequisites if you want to follow along: ```bash npm install --save-dev vitest gray-matter zod ``` The test goes at `tests/podcast-collection.test.ts` and runs with `npx vitest run` or `npm test` if you've wired it up. ```typescript import { describe, it, expect } from 'vitest'; import matter from 'gray-matter'; import { z } from 'zod'; const podcastSchema = z.object({ title: z.string(), episodeNumber: z.number(), guest: z.string(), description: z.string(), spotifyUrl: z.string().url().optional(), youtubeUrl: z.string().url().optional(), appleUrl: z.string().url().optional(), // ... }); for (const file of episodeFiles) { it(`${file} matches the podcast schema`, () => { const { data } = matter(readFileSync(file, 'utf8')); const result = podcastSchema.safeParse(data); expect(result.success).toBe(true); }); } ``` `safeParse` returns `{ success: true, data }` on a match or `{ success: false, error }` on a mismatch. Unlike `parse`, it doesn't throw, which is what you want when iterating over many files. I ran it. It failed on 10 episodes. Three had URL fields wrapped in markdown link syntax: `spotifyUrl: "[https://...](https://...)"`. Notion auto-formatting that I'd pasted in months ago and never looked at again. Seven had `appleUrl: ""`. I had a habit of leaving the field blank when an episode hadn't yet been published on Apple Podcasts, intending to fill it in later. I never filled it in. The Zod URL validator caught all seven instantly. For 18 months, I'd been linking to broken Spotify URLs from my own episode pages. While you're declaring podcast schemas, also emit [`PodcastSeries`](https://schema.org/PodcastSeries) on the podcast hub and [`PodcastEpisode`](https://schema.org/PodcastEpisode) on each episode/takeaway page. Podcast schemas are how Google's podcast surfaces (Search, Assistant) discover episodes. Calling all of this an SEO tip is a stretch. But broken outbound links from your domain are something engines notice, and content collection schemas automatically catch them. Worth doing. ## 18. `dateModified`, shown to readers when distinct Freshness is a ranking signal. `dateModified` already lives in your JSON-LD if you wire `updateDate` into `getArticleSchema`. Show it to readers visually too, so they trust the article isn't stale: ```astro {post.updateDate && post.publishDate && new Date(post.updateDate).toDateString() !== new Date(post.publishDate).toDateString() && (

Updated {post.updateDate.toLocaleDateString('en-US', { year: 'numeric', month: 'short', day: 'numeric' })}

)} ``` Only show it when the dates actually differ. Otherwise, it's noise. ## 19. `rel="prev"` / `rel="next"` and noindex on pagination Astro's pagination provides `Astro.props.page.url.prev` and `page.url.next` for free. ```astro {prevUrl && } {nextUrl && } ``` Google [deprecated `rel=prev/next` as a ranking signal](https://developers.google.com/search/blog/2019/03/rel-next-prev-experiment) in 2019. Bing and other engines still use it, and it costs you nothing, so I keep it. For the noindex part, Google's current guidance is mixed. The old advice was "noindex page 2+ of pagination." The newer advice is "make page 2+ self-canonical and let them rank if they're useful." I noindex page 2+ on my blog because they're not useful as landing pages (readers want the post itself, not a list of headlines). Your call. ## 20. `noindex` the 404 page One-line change. Otherwise, the occasional 404 sneaks into search results, which is a bad experience for everyone: ```astro ``` Watch for "soft 404s" too. A soft 404 is a page that returns HTTP 200 OK but looks empty or error-shaped to Google ("No results found," "Sorry, this content is unavailable"). Search Console flags these and removes them from the index. If you have empty-state pages, give them real content or return a real 404. ## What's next on my list Next quarterly audit will pick up: - Dynamic per-post OG image generation with `@vercel/og` or `satori`. Useful for posts that don't ship with a hero image. The libraries are already installed. - Visible breadcrumb UI to match the `BreadcrumbList` schema I shipped. - `FAQPage` JSON-LD for the FAQ section below. The visible FAQ is here; the schema requires extending the post template to read a `faq:` frontmatter array. - Site-wide font preconnect in the layout. - [Google Search Console](https://search.google.com/search-console) and [Bing Webmaster Tools](https://www.bing.com/webmasters) verification + indexing requests for the new pages. Without these, you're blind to crawl errors and Core Web Vitals reports. - [IndexNow](https://www.indexnow.org/) push-indexing for Bing and Yandex. Cheap to wire from a Netlify build hook. Worth saying out loud: everything on the main list is **on-page SEO** (things you control inside your own pages). **Off-page SEO** (backlinks, brand mentions, real-world authority) is the other half of the picture and doesn't fit in a code-driven checklist. Backlinks are when other sites link to yours, and they're acquired by writing things people want to link to, then asking others to link to them. ## Glossary A quick reference for the terms used above. - **Anchor text**: the visible text inside a link. - **Backlink**: a link from another site to yours. The currency of off-page SEO. - **Canonical URL**: the "official" URL of a page when multiple URLs serve similar content. Set via ``. - **CLS** (Cumulative Layout Shift): how much elements jump around as a page loads. Lower is better. - **Core Web Vitals**: Google's three page-experience metrics. LCP, CLS, INP. - **Crawler / bot**: software that fetches and indexes web pages (Googlebot, Bingbot, GPTBot, PerplexityBot). - **E-E-A-T**: Experience, Expertise, Authoritativeness, Trustworthiness. Google's framework for evaluating content credibility. - **Featured snippet**: the answer box at the top of Google for question queries. - **INP** (Interaction to Next Paint): how quickly the page responds to taps, clicks, and key presses. Replaced FID in March 2024. - **JSON-LD**: structured data embedded in HTML as a `