· seo · 29 min read
Astro SEO Checklist 2026: 20 tactics ranked by impact
Astro SEO checklist for 2026: 20 tactics ranked from biggest to smallest impact, including canonical URLs, title tag rules, JSON-LD structured data, Person and BreadcrumbList schema, llms.txt, Pagefind search, and a Zod schema that caught 10 bugs in my podcast frontmatter.
Neciu Dan
Hi there, it's Dan, a technical co-founder of an ed-tech startup, host of Señors at Scale - a podcast for Senior Engineers, Organizer of ReactJS Barcelona meetup, international speaker and Staff Software Engineer, I'm here to share insights on combining
technology and education to solve real problems.
I write about startup challenges, tech innovations, and the Frontend Development.
Subscribe to join me on this journey of transforming education through technology. Want to discuss
Tech, Frontend or Startup life? Let's connect.
Astro SEO comes down to about 20 things, ranked roughly by how much each one moves the needle on a small blog. Most take an afternoon to ship. The list below is exactly what I run on this site as of April 2026, top to bottom, after my own SEO notes went 14 months stale and I had to redo the audit from scratch.
I had a couple of audit docs sitting in docs/ from February 2025. I’d shipped the recommendations, ticked the boxes, and stopped thinking about it. Then someone left a comment on my last post asking how I handle SEO, and I figured I’d reread my own notes.
They were 14 months stale.
Half the AI crawlers in my robots.txt didn’t exist when I wrote that file. PerplexityBot wasn’t a thing yet. llms.txt wasn’t a convention yet. Pagefind was at v0-something.
So I did the audit again, shipped 17 commits over a weekend, and ranked the whole list.
What this article assumes
You know HTML, JavaScript, and a bit of Astro. You haven’t done structured SEO before. By the end you’ll have rich results in Google, working social cards on LinkedIn and Slack, site search, and a JSON-LD setup that future-you doesn’t have to revisit.
Three terms I’ll use a lot, defined upfront so the rest reads cleanly:
- Rich result: anything Google shows in the search results that isn’t just a blue link with a description. Star ratings, breadcrumbs, FAQ accordions, recipe cards. They get more clicks. You unlock them with structured data.
- Featured snippet: the answer box at the top of Google for question queries (“how do I add a sitemap to Astro?”). It quotes a paragraph or list directly off your page. The click-through is real.
- JSON-LD: a small
<script type="application/ld+json">block in your HTML head that describes the page to search engines in machine-readable form. Google reads it; humans don’t see it. This is how you tell Google “this is an article, by this author, published on this date, about this topic.”
Here’s the order, by impact:
- Canonical URLs on every page
- Title tag rules (50 to 60 chars, keyword first)
- Article (or BlogPosting) JSON-LD with Person, image, validation
- Per-post Open Graph and Twitter cards
- Unique meta description on every key page
- One
<h1>per page, logical heading order - Sitemap and robots.txt with the 2026 AI crawler list
- Astro
<Image>for everything insrc/assets/, with properalt - First paragraph as definition or outcome
- Internal linking with descriptive anchor text
BreadcrumbListschema- URL structure and 301 redirects when slugs change
llms.txtplus a build-timellms-full.txt- Pagefind for site search
HowToschema for tutorial-style postsSpeakableJSON-LD on Article schema- Content collection schemas (the one that found 10 bugs)
dateModified, shown to readers when distinctrel="prev"/rel="next"and noindex on paginationnoindexthe 404 page
Is Astro good for SEO?
Yes, and it’s one of the better options for content sites in 2026. Astro ships static HTML by default, has first-class image optimisation, sitemap and RSS integrations, content collections with schema validation, and lets you inject any structured data you want via <Fragment slot="head"> (Astro’s syntax for pushing markup into a parent layout’s <head>). The framework gets out of your way; the SEO work is on you.
That’s also the catch. Astro doesn’t auto-generate canonical URLs, JSON-LD, or meta descriptions. You wire them up. Once.
The 20-item list below is what “wired up” looks like.
1. Canonical URLs on every page
This is the biggest win for the smallest amount of code.
If a page is reachable at more than one URL (with or without trailing slash, with query params, paginated, tagged, categorised), engines treat them as duplicates and split your ranking signal (the strength Google assigns your page based on links, content, freshness, and so on) across all of them. A canonical URL collapses them into one. Google’s canonicalization docs are worth a 5-minute read if you’ve never set this up before.
---
import { getCanonical } from '~/helpers/permalinks';
const canonical = getCanonical(Astro.url.pathname);
---
<link rel="canonical" href={canonical} />
The getCanonical helper is just a function that joins your site URL (from astro.config.mjs) with the current path and normalises trailing slashes. If you don’t have one, write a 5-line version. The point is one source of truth.
Every blog post, takeaway page, podcast page, and category page on my site emits one. Pagination uses the paginated URL as canonical so page 2 doesn’t compete with page 1.
Make sure your <link rel="canonical"> and your og:url (the URL you ship in your Open Graph meta tag, used by social platforms) agree. Mismatched canonicals are a real footgun: Open Graph platforms cache one URL while engines index another, and your share counts split across both.
If you do nothing else from this list, do this.
2. Title tag rules
The single biggest CTR (click-through rate) lever you have. Get this wrong and the rest of the list barely matters.
The rules:
- 50 to 60 characters (Google truncates around 580px width in SERPs, not at a fixed character count, but characters are a decent proxy)
- Primary keyword as close to the front as you can stand
- Brand suffix optional and at the end (“Title | Brand”)
- Don’t repeat your H1 word-for-word: the title tag is for the search results page, the H1 is for the page itself, and you can usually make the title tag punchier than the H1
- Each page’s title must be unique across the site
Bad title:
"Welcome to my blog"
Good title:
"Astro SEO Checklist 2026: 20 tactics ranked by impact"
The second one tells you what’s in it, when, and how it’s organised. The first one tells you nothing.
This is also the title of this article, by the way. I picked it on purpose, after writing this section.
3. Article (or BlogPosting) JSON-LD with Person, image, and validation
Schema.org is a vocabulary maintained by Google, Microsoft, Yahoo, and Yandex. It defines types like Article, Person, Organization, Course, PodcastSeries. You write data using these types and embed it in your page as JSON-LD. Search engines read it and decide which rich results you’re eligible for.
For a blog, the minimum is Article or BlogPosting. Google’s Article structured data guide lists the exact fields they reward.
Here’s the helper I run on every post:
export function getArticleSchema(post) {
return {
'@context': 'https://schema.org',
'@type': 'Article',
headline: post.title,
description: post.excerpt,
image: {
'@type': 'ImageObject',
url: post.image,
width: 1200,
height: 630,
},
datePublished: post.publishDate,
dateModified: post.updateDate ?? post.publishDate,
author: getPersonSchema(),
publisher: getPublisherOrganization(),
mainEntityOfPage: post.canonical,
wordCount: post.wordCount,
};
}
What that turns into when the page is built:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Astro SEO Checklist 2026: 20 tactics ranked by impact",
"description": "Astro SEO checklist for 2026...",
"image": {
"@type": "ImageObject",
"url": "https://neciudan.dev/images/articles/astro-seo.png",
"width": 1200,
"height": 630
},
"datePublished": "2026-04-25",
"dateModified": "2026-04-25",
"author": { "@type": "Person", "name": "Neciu Dan", "...": "..." },
"publisher": { "@type": "Organization", "name": "Neciu Dan", "...": "..." },
"mainEntityOfPage": "https://neciudan.dev/astro-seo-checklist-2026"
}
</script>
That <script> block goes inside your <head>. In Astro, the cleanest way is via <Fragment slot="head"> from a page or component:
---
import Layout from '~/layouts/PageLayout.astro';
import { getArticleSchema } from '~/helpers/schema';
const articleSchema = getArticleSchema(post);
---
<Layout metadata={metadata}>
<Fragment slot="head">
<script
type="application/ld+json"
set:html={JSON.stringify(articleSchema)}
/>
</Fragment>
<!-- page content -->
</Layout>
A few things that bite people:
The image field as a plain string used to be enough. It isn’t anymore. Google’s article rich results require ImageObject with explicit width and height. If you ship a bare string, you lose eligibility. Same for og:image (use og:image:width and og:image:height siblings, more on those in section #4).
The Person schema for the author is where E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness, Google’s framework for “is this author credible?”) lives:
export function getPersonSchema() {
return {
'@type': 'Person',
name: 'Neciu Dan',
url: 'https://neciudan.dev/about',
image: 'https://neciudan.dev/images/dan-portrait.jpg',
jobTitle: 'Staff Engineer',
worksFor: { '@type': 'Organization', name: 'Rover' },
description: 'Frontend engineer, podcast host, conference speaker.',
sameAs: [
'https://twitter.com/neciudan',
'https://www.linkedin.com/in/neciudan',
'https://github.com/neciudan',
'https://www.youtube.com/@neciudan',
],
};
}
The image here should be a real photo of the author, not a logo. Google’s knowledge graph (the box that pops up on the right of search results when you search for a person or topic) uses this image when stitching together your byline.
The sameAs array is what links your authored articles to your real-world identity across platforms. Skip this and your bylines float free of any actual person. If you publish on Mastodon or other IndieWeb-friendly platforms, you can also add a <link rel="me" href="..."> to your About page for verified authorship.
You’ll also want a Publisher (Organization) schema:
export function getPublisherOrganization() {
return {
'@type': 'Organization',
name: 'Neciu Dan',
url: 'https://neciudan.dev',
logo: {
'@type': 'ImageObject',
url: 'https://neciudan.dev/images/logo.png',
width: 600,
height: 60,
},
};
}
For a personal site, the publisher is “you the brand” rather than “you the human.”
Validate everything you emit. Two free tools:
- Google Rich Results Test: paste your live URL and it tells you exactly which rich results you’re eligible for, plus any errors
- Schema.org Validator: generic schema.org compliance check, useful when Google’s tool doesn’t support a type
Run both before declaring victory. I have caught typos and missing required fields with each of them at different times.
I keep all the JSON-LD generators in src/helpers/schema.ts and inject them via <Fragment slot="head"> in the layout. New posts get the full set for free.
4. Per-post Open Graph and Twitter cards
Open Graph is a meta-tag protocol invented by Facebook in 2010, now used by every major platform (LinkedIn, X, Slack, Discord, iMessage) to render link previews. Twitter cards are X’s slightly extended variant.
This isn’t strictly SEO. It’s how your link looks when someone pastes it. But “looks good when shared” is a click-through multiplier, and click-through is a ranking signal.
Standard fields:
const metadata = {
title,
description: excerpt,
canonical: url,
openGraph: {
type: 'article',
url,
locale: 'en_US',
images: imageUrl ? [{ url: imageUrl, width: 1200, height: 630 }] : undefined,
siteName: 'Neciu Dan',
},
twitter: {
cardType: imageUrl ? 'summary_large_image' : 'summary',
site: '@neciudan',
creator: '@neciudan',
},
};
What that turns into in the HTML head:
<meta property="og:title" content="..." />
<meta property="og:description" content="..." />
<meta property="og:url" content="..." />
<meta property="og:type" content="article" />
<meta property="og:locale" content="en_US" />
<meta property="og:image" content="..." />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta property="og:site_name" content="Neciu Dan" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@neciudan" />
<meta name="twitter:creator" content="@neciudan" />
Default to the post hero image. If there isn’t one, fall back to a site-default OG image (set once in your config). Don’t ship a post that previews as a blank rectangle.
While you’re in here, emit the Open Graph article extensions on blog posts:
<meta property="article:published_time" content="2026-04-25T00:00:00.000Z" />
<meta property="article:modified_time" content="2026-04-25T00:00:00.000Z" />
<meta property="article:author" content="https://neciudan.dev/about" />
<meta property="article:section" content="SEO" />
<meta property="article:tag" content="Astro" />
<meta property="article:tag" content="SEO" />
LinkedIn, Slack, and a few others read these. Most don’t. The cost of adding them once in your layout is zero.
5. Unique meta description on every key page
The biggest mistake I’d been making for a while: my homepage meta description was just falling through to the long site-wide description from config.yaml. Same for /blog. Same for /about.
A meta description is the snippet under your blue link in Google. Each of those pages should have a unique, factual sentence that mentions the topic and the source. 150 to 160 characters, no fluff.
Bad:
"Personal website of a software engineer based in Barcelona who works on..."
Good:
"JavaScript and frontend articles by Neciu Dan: React, testing, security, career.
Practical insights for working developers."
The second one is something a human would actually click. The first one sounds like a LinkedIn bio.
One catch: Google rewrites about 70% of meta descriptions on the fly to better match the user’s query. You’re not writing the final SERP snippet; you’re writing the default fallback. Still worth doing well, because that’s what shows on social sites and in Bing.
6. One <h1> per page, logical heading order
Modern HTML5 actually allows multiple <h1> elements within sectioning elements, and Google has explicitly said either approach is fine. Pick whichever you like.
That said, I still ship one <h1> per page on this site. The outline reads cleaner and it’s harder to break by accident.
I had two <h1>s on my podcast hub for months without noticing. The second one was for “What is Señors @ Scale?” which used to be a section header. I added a headingLevel="h1" | "h2" prop to my Headline component and now it’s h2.
Sections are h2. Subsections are h3. The outline should read like a table of contents.
7. Sitemap and robots.txt with the 2026 AI crawler list
A sitemap is an XML file at the root of your domain listing every URL you want indexed. robots.txt is a plain text file telling crawlers (Googlebot, Bingbot, GPTBot, etc.) which paths they can and can’t visit.
Both are one-line setups in Astro using the @astrojs/sitemap integration:
// astro.config.mjs
import sitemap from '@astrojs/sitemap';
export default defineConfig({
site: 'https://neciudan.dev',
integrations: [sitemap()],
});
Then add Sitemap: https://neciudan.dev/sitemap-index.xml to public/robots.txt. Done.
While you’re in robots.txt, allow the AI crawlers. The 2025 list (GPTBot, ChatGPT-User, anthropic-ai, Claude-Web, Google-Extended, CCBot, Bard, AI2Bot) is incomplete in 2026.
Add these:
User-agent: PerplexityBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Bytespider
Allow: /
User-agent: cohere-ai
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Diffbot
Allow: /
Honest framing: crawlers default to allowed unless you disallow them. Adding explicit User-agent: PerplexityBot plus Allow: / doesn’t unlock anything that wasn’t already allowed by User-agent: * plus Allow: /.
Some bots check for explicit allowlists before crawling, but most don’t. Treat these blocks as documentation of intent rather than a magic switch.
One thing people conflate: robots.txt Disallow and <meta name="robots" content="noindex"> are different mechanisms. Disallow blocks crawling entirely (the crawler never visits the URL, so it never sees a noindex either). If you actually want a page out of the index, use noindex and let the crawler in. If you Disallow a page that’s already indexed, it stays indexed and you’ve just blinded yourself.
While we’re on infrastructure: HTTPS only. Google has used HTTPS as a ranking signal since 2014. Letsencrypt makes it free. There’s no excuse for http:// in 2026.
8. Astro <Image> for everything in src/assets/, with proper alt
Core Web Vitals are Google’s three page-experience metrics: LCP (Largest Contentful Paint, how long until the biggest visible element renders), CLS (Cumulative Layout Shift, how much things jump around as the page loads), and INP (Interaction to Next Paint, how quickly the page responds to taps and clicks). INP replaced FID (First Input Delay) as a Core Web Vital in March 2024.
LCP is usually an image. CLS is usually an image without width and height. Image-heavy pages with layout thrash hurt INP too.
Astro’s <Image> component from astro:assets handles the image side: WebP conversion at build time, srcset generation (multiple resolutions for different screen sizes), sizes, lazy loading, async decoding, content-hashed filenames safe for immutable caching. I wrote about this in detail when I was bleeding bandwidth.
The short version: anything you import from src/assets/ goes through the pipeline. Anything in public/ ships byte-for-byte, no resizing, no conversion. Hero images and post images live in src/assets/. Always.
The alt attribute matters as much as the optimisation:
- Describe what’s in the image factually. Don’t editorialise.
- Don’t start with “image of” or “picture of.” Screen readers announce “image” already.
- Include the relevant keyword if it fits naturally. Don’t stuff it.
- Empty
alt=""is correct for purely decorative images. Missingaltis not. - Keep it under ~125 characters.
<Image
src={hero}
alt="Network tab showing 6.3MB hero video downloaded on every visit"
width={1200}
height={630}
/>
Specific beats vague. “Network tab screenshot” is worse than the example above.
9. First paragraph as definition or outcome
For posts that answer a specific question (“what is X”, “how to Y”), put a one or two sentence definition or outcome in the first paragraph. Featured snippets and AI Overviews (the ChatGPT-style answer block Google sometimes shows above the regular results) lift the first paragraph as the answer.
Bad first paragraph:
“Hey everyone, I’ve been thinking about dynamic programming lately, and I wanted to share some thoughts.”
Good first paragraph:
“Dynamic programming is a method for solving problems by breaking them down into smaller subproblems and storing the solutions so they can be reused instead of recomputed.”
The second one is a quotable answer. The first one is throat-clearing.
Since 2020, Google’s passage indexing (the practice of ranking individual paragraphs of long articles for specific queries, instead of just ranking the page as a whole) means any paragraph in a long article can become a featured snippet, not just the first one. So the first-paragraph rule is a strong default, not the only spot where featured-snippet eligibility lives. Section intros, FAQ answers, and bulleted definitions all qualify.
10. Internal linking with descriptive anchor text
Anchor text is the visible text inside an <a> tag. Engines learn what a page is about partly from the anchor text other pages use to link to it. If every link to my React post says “click here,” Google has nothing to work with. If half the links say “10 React patterns” and the other half say “common React mistakes,” that’s signal.
Two practical things:
- Posts in the same series link to each other inline, not just from a “related posts” footer.
- The link text says what the linked thing is, not “click here” or “in this article.”
Both small. Both compound over time.
A few extras for the file:
- Vary your anchor text per target page. Engines distrust exact-match repetition (it looks like manipulation).
- Don’t be afraid to link out to authoritative sources (Wikipedia, .edu, .gov, framework docs). Outbound links to authority signal a well-researched piece. Old SEO advice said to hoard them; that’s dead.
- The category and tag system is a topical clusters play. Topical clusters are groups of related content under a shared category that engines treat as evidence of topic authority. “SEO,” “Astro,” “Performance” are useful categories. “Software Development” is too generic to cluster anything.
11. BreadcrumbList schema
BreadcrumbList is one of the higher-impact rich results. Google replaces the URL in the SERP with the breadcrumb hierarchy (“Home › Blog › SEO › Astro SEO Checklist”), which is way more clickable.
{
'@context': 'https://schema.org',
'@type': 'BreadcrumbList',
itemListElement: [
{
'@type': 'ListItem',
position: 1,
name: 'Home',
item: 'https://neciudan.dev/',
},
{
'@type': 'ListItem',
position: 2,
name: 'Blog',
item: 'https://neciudan.dev/blog',
},
{
'@type': 'ListItem',
position: 3,
name: 'Astro SEO Checklist 2026',
},
],
}
The last item omits item (it’s the current page). Pair the schema with a visible breadcrumb component at the top of each post so users see what the SERP shows.
12. URL structure and 301 redirects when slugs change
Short URLs are easier to share, easier to remember, and don’t get truncated in SERPs. Kebab-case, lowercase, no underscores, no .html extensions, no trailing dates unless they’re meaningfully part of the topic.
/astro-seo-checklist-2026 good. /2026/04/25/My_Astro_SEO_Playbook_Ranked.html bad.
When you rename a slug (which you will, eventually), set up a 301 redirect. A 301 is the HTTP status code for “permanently moved” and tells engines to update their index to the new URL, transferring the old URL’s ranking equity to the new one. A 302 (“temporarily moved”) tells engines NOT to update their index and is rarely what you mean.
On Netlify, drop into public/_redirects:
/my-astro-seo-playbook-ranked /astro-seo-checklist-2026 301
Or use Astro’s built-in redirect config:
// astro.config.mjs
export default defineConfig({
redirects: {
'/my-astro-seo-playbook-ranked': '/astro-seo-checklist-2026',
},
});
Without the redirect, old backlinks 404 and you lose the equity. I literally renamed this article’s slug while writing it, and the redirect is the only reason the old URL still works.
13. llms.txt plus a build-time llms-full.txt
llms.txt is the new robots.txt for AI models. Static markdown at the root of your domain. Looks like this:
# Neciu Dan
> Personal site of Neciu Dan, Staff Engineer, host of the Señors @ Scale
> podcast, ReactJS Barcelona organizer, international speaker.
## Blog
- [Blog index](https://neciudan.dev/blog): JavaScript, React, testing, security
- [RSS feed](https://neciudan.dev/rss.xml)
## Podcast
- [Podcast hub](/senors-at-scale): Senior engineers on scale, performance, frontend
- [Episode takeaways](/takeaways)
llms-full.txt is the same idea, but generated at build time, with all your blog content concatenated. Makes it cheap for an AI doing retrieval to grab everything in one fetch.
Astro route at src/pages/llms-full.txt.ts:
import { getCollection } from 'astro:content';
import { fetchPosts } from '~/helpers/blog';
export const GET = async () => {
const posts = await fetchPosts();
const podcasts = await getCollection('podcast');
// ...build markdown sections from each
return new Response(text, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
};
ChatGPT, Claude, and Perplexity are documented to look for these files. Whether they always honour them is uncertain. The cost of adding them is zero.
14. Pagefind for site search
Not strictly SEO. But “user can find old content” is what makes a site sticky, and stickiness is a ranking signal.
I had no search for years. 30+ blog posts, 30+ podcast takeaways, and the only navigation was the homepage list and an archive page. (I built one this weekend, you can try it now.)
Quickstart, in order
npm install --save-dev pagefind- Add
"postbuild": "pagefind --site dist"topackage.jsonscripts - Mark indexable content with
data-pagefind-body(see below) - Create a
/searchpage that loads the Pagefind UI (see below) - Run
npm run build. Pagefind generatesdist/pagefind/automatically
Pagefind vs Algolia vs Lunr vs Fuse
The four main options for site search on a static Astro blog:
- Algolia: cloud-hosted, fast, costs money once you have real traffic, requires backend sync
- Lunr.js: build-time, ships the entire index up front, fine for fewer than ~100 docs
- Fuse.js: client-side fuzzy matching, similar tradeoff to Lunr
- Pagefind: build-time, fragmented index loaded on demand, scales to thousands of pages without changing strategy
The right answer in 2026 is Pagefind for almost any static Astro site.
Pagefind crawls your build output (dist/) after Astro is done. It writes a fragmented index. The browser only loads index chunks for the words you actually search. Initial JS payload is around 40KB.
By default, Pagefind indexes everything inside <body>. That includes navigation, footer, related posts. It’s noisy.
Mark your actual content with data-pagefind-body:
<article data-pagefind-body>
<meta data-pagefind-filter="type:blog" />
{post.category && <meta data-pagefind-filter={`category:${post.category}`} />}
{post.tags?.map((tag) => <meta data-pagefind-filter={`tag:${tag}`} />)}
{/* post content */}
</article>
The <meta> tags are non-rendering. Pagefind reads the attributes during indexing.
The search page (/search) is a div and a script:
<link href="/pagefind/pagefind-ui.css" rel="stylesheet" />
<div id="search"></div>
<script is:inline>
import('/pagefind/pagefind-ui.js').then(({ PagefindUI }) => {
new PagefindUI({ element: '#search', showSubResults: true });
});
</script>
is:inline is the part that took me a minute to figure out. Without it, Astro’s build (Vite) tries to resolve /pagefind/pagefind-ui.js at build time, fails (because Pagefind hasn’t run yet), and crashes the build.
is:inline tells Astro to skip Vite processing on that script tag and emit it verbatim into the HTML. The dynamic import() runs in the browser, where the file does exist.
One caveat: Pagefind only indexes pages that end up as static HTML in dist/. SSR-only routes (Astro hybrid mode, no prerender = true) get skipped. For a typical content site this is fine.
First build after wiring it up indexed 56 pages and 7,192 words.
Bonus: now that you have search, add a SearchAction to your WebSite JSON-LD. This is what unlocks the Google sitelinks search box on SERPs (the small search input that appears under your site’s main result):
{
'@context': 'https://schema.org',
'@type': 'WebSite',
url: 'https://neciudan.dev',
potentialAction: {
'@type': 'SearchAction',
target: 'https://neciudan.dev/search?q={search_term_string}',
'query-input': 'required name=search_term_string',
},
}
You’ll need to wire ?q= query parsing in your search page for this to actually work end to end. Pagefind’s UI doesn’t read URL params by default.
15. HowTo schema for tutorial-style posts
If a post walks the reader through sequential steps (“how to add Pagefind to Astro,” “how I cut bandwidth”), HowTo schema marks the structure for Google. Tutorial posts with HowTo schema get step indicators in SERPs and occasional carousel placement.
You need at least 3 steps for Google to render the rich result, and ideally 4 or more.
{
'@context': 'https://schema.org',
'@type': 'HowTo',
name: 'How to add Pagefind site search to Astro',
totalTime: 'PT1H',
step: [
{
'@type': 'HowToStep',
position: 1,
name: 'Install Pagefind',
text: 'Run npm install --save-dev pagefind in your Astro project root.',
},
{
'@type': 'HowToStep',
position: 2,
name: 'Wire the postbuild script',
text: 'Add "postbuild": "pagefind --site dist" to package.json scripts.',
},
{
'@type': 'HowToStep',
position: 3,
name: 'Mark indexable content',
text: 'Add data-pagefind-body to the outer article element on each post template.',
},
{
'@type': 'HowToStep',
position: 4,
name: 'Build the search page',
text: 'Create src/pages/search.astro that loads Pagefind UI from /pagefind/pagefind-ui.js.',
},
],
}
A post can carry both Article and HowTo JSON-LD. Google reads both. Your posts on bandwidth optimisation, build pipelines, and this checklist itself are all candidates.
16. Speakable JSON-LD on Article schema
For voice answer engines (Google Assistant, Alexa). One property in your existing Article schema, see SpeakableSpecification:
speakable: {
'@type': 'SpeakableSpecification',
cssSelector: ['h1', '[data-speakable]'],
}
The [data-speakable] selector is a forward-looking hook. If you write a TL;DR in a future post and mark it data-speakable, voice engines will read it as the summary. Until then, the h1 covers the headline.
Engines that don’t care about Speakable ignore the property. There’s no downside.
17. Content collection schemas (the one that found 10 bugs)
This is the one that surprised me.
I have a podcast section on the site (Señors @ Scale, 30 episodes). Each episode is a markdown file in src/content/podcast/. Standard Astro content collection (the framework’s typed-frontmatter system, where you define a schema once and Astro validates every file against it). Except I’d never declared the collection in config.ts.
It worked anyway. getCollection('podcast') was permissive. No schema, no validation, just whatever happened to be in the frontmatter. For a year and a half I’d been adding episodes by copy-pasting an existing one and editing fields.
You can guess where this is going.
I wrote a Zod schema for the collection. (Zod is a TypeScript-first validation library; you describe the shape of an object and it tells you whether real data matches.) Then, before declaring the collection, I wrote a Vitest test that ran the schema against every episode’s frontmatter.
Prerequisites if you want to follow along:
npm install --save-dev vitest gray-matter zod
The test goes at tests/podcast-collection.test.ts and runs with npx vitest run or npm test if you’ve wired it up.
import { describe, it, expect } from 'vitest';
import matter from 'gray-matter';
import { z } from 'zod';
const podcastSchema = z.object({
title: z.string(),
episodeNumber: z.number(),
guest: z.string(),
description: z.string(),
spotifyUrl: z.string().url().optional(),
youtubeUrl: z.string().url().optional(),
appleUrl: z.string().url().optional(),
// ...
});
for (const file of episodeFiles) {
it(`${file} matches the podcast schema`, () => {
const { data } = matter(readFileSync(file, 'utf8'));
const result = podcastSchema.safeParse(data);
expect(result.success).toBe(true);
});
}
safeParse returns { success: true, data } on a match or { success: false, error } on a mismatch. Unlike parse, it doesn’t throw, which is what you want when iterating over many files.
I ran it. It failed on 10 episodes.
Three had URL fields wrapped in markdown link syntax: spotifyUrl: "[https://...](https://...)". Notion auto-formatting that I’d pasted in months ago and never looked at again.
Seven had appleUrl: "". I had a habit of stubbing the field empty when an episode wasn’t yet published on Apple Podcasts, intending to fill it in later. I never filled it in. The Zod URL validator caught all seven instantly.
For 18 months I’d been linking to broken Spotify URLs from my own episode pages. Real users had clicked them. Nobody told me.
While you’re declaring podcast schemas, also emit PodcastSeries on the podcast hub and PodcastEpisode on each episode/takeaway page. Podcast schemas are how Google’s podcast surfaces (Search, Assistant) discover episodes.
Calling all of this an SEO tip is a stretch. But broken outbound links from your domain are something engines notice, and content collection schemas catch them automatically. Worth doing.
18. dateModified, shown to readers when distinct
Freshness is a ranking signal. dateModified already lives in your JSON-LD if you wire updateDate into getArticleSchema.
Show it to readers visually too, so they trust the article isn’t stale:
{post.updateDate && post.publishDate &&
new Date(post.updateDate).toDateString() !== new Date(post.publishDate).toDateString() && (
<p class="text-sm text-muted mt-1">
Updated <time datetime={post.updateDate.toISOString()}>
{post.updateDate.toLocaleDateString('en-US', {
year: 'numeric', month: 'short', day: 'numeric'
})}
</time>
</p>
)}
Only show it when the dates actually differ. Otherwise it’s noise.
19. rel="prev" / rel="next" and noindex on pagination
Astro’s pagination provides Astro.props.page.url.prev and page.url.next for free.
<Layout metadata={metadata}>
<Fragment slot="head">
{prevUrl && <link rel="prev" href={prevUrl} />}
{nextUrl && <link rel="next" href={nextUrl} />}
</Fragment>
</Layout>
Google deprecated rel=prev/next as a ranking signal in 2019. Bing and other engines still use it, and it costs you nothing, so I keep it.
For the noindex part, Google’s current guidance is mixed. The old advice was “noindex page 2+ of pagination.” The newer advice is “make page 2+ self-canonical and let them rank if they’re useful.” I noindex page 2+ on my blog because they’re not useful as landing pages (readers want the post itself, not a list of headlines). Your call.
20. noindex the 404 page
One-line change. Otherwise the occasional 404 sneaks into search results, which is a bad experience for everyone:
<Layout metadata={{
title: 'Error 404',
robots: { index: false, follow: false }
}}>
Watch for “soft 404s” too. A soft 404 is a page that returns HTTP 200 OK but looks empty or error-shaped to Google (“No results found,” “Sorry, this content is unavailable”). Search Console flags these and removes them from the index. If you have empty-state pages, give them real content or return a real 404.
What’s next on my list
Next quarterly audit will pick up:
- Dynamic per-post OG image generation with
@vercel/ogorsatori. Useful for posts that don’t ship with a hero image. The libraries are already installed. - Visible breadcrumb UI to match the
BreadcrumbListschema I shipped. FAQPageJSON-LD for the FAQ section below. The visible FAQ is here; the schema requires extending the post template to read afaq:frontmatter array.- Site-wide font preconnect in the layout.
- Google Search Console and Bing Webmaster Tools verification + indexing requests for the new pages. Without these you’re blind to crawl errors and Core Web Vitals reports.
- IndexNow push-indexing for Bing and Yandex. Cheap to wire from a Netlify build hook.
Worth saying out loud: everything on the main list is on-page SEO (things you control inside your own pages). Off-page SEO (backlinks, brand mentions, real-world authority) is the other half of the picture and doesn’t fit in a code-driven checklist. Backlinks are when other sites link to yours, and they’re acquired by writing things people want to link to, then asking. That’s a different post.
Glossary
A quick reference for the terms used above.
- Anchor text: the visible text inside a link.
- Backlink: a link from another site to yours. The currency of off-page SEO.
- Canonical URL: the “official” URL of a page when multiple URLs serve similar content. Set via
<link rel="canonical">. - CLS (Cumulative Layout Shift): how much elements jump around as a page loads. Lower is better.
- Core Web Vitals: Google’s three page-experience metrics. LCP, CLS, INP.
- Crawler / bot: software that fetches and indexes web pages (Googlebot, Bingbot, GPTBot, PerplexityBot).
- E-E-A-T: Experience, Expertise, Authoritativeness, Trustworthiness. Google’s framework for evaluating content credibility.
- Featured snippet: the answer box at the top of Google for question queries.
- INP (Interaction to Next Paint): how quickly the page responds to taps, clicks, and key presses. Replaced FID in March 2024.
- JSON-LD: structured data embedded in HTML as a
<script type="application/ld+json">block. The standard format for schema.org markup. - Knowledge graph: Google’s database of entities (people, places, things) and their relationships. The box on the right of search results.
- LCP (Largest Contentful Paint): how long until the biggest visible element renders. Lower is better.
- Open Graph: the meta-tag protocol that controls how links preview on social platforms (
og:title,og:image, etc.). - Off-page SEO: signals from outside your site (backlinks, brand mentions, social shares).
- On-page SEO: signals you control inside your own pages (content, schema, internal links, performance).
- Passage indexing: Google’s practice of ranking individual paragraphs of long articles, not just the page as a whole.
- Ranking signal: any factor Google uses to decide your page’s order in search results. Hundreds exist; freshness, authority, and user behaviour are the big ones.
- Rich result: any non-blue-link presentation in Google search (breadcrumbs, FAQ accordions, star ratings, recipe cards).
- Schema.org: the vocabulary used to describe entities (Article, Person, Course, etc.) in structured data. Maintained by Google, Microsoft, Yahoo, and Yandex.
- SERP: Search Engine Results Page. The list of results Google or another engine shows for a query.
- Sitemap: an XML file at the root of your domain listing every URL you want indexed.
- 301 redirect: HTTP status code for “permanently moved.” Tells engines to update their index and transfer ranking equity.
- Topical clusters: groups of related content under a shared category that engines treat as evidence of topic authority.
Frequently asked questions
Is Astro SEO-friendly?
Yes. Astro renders static HTML by default, ships a sitemap integration, and lets you inject any <head> markup or JSON-LD via <Fragment slot="head">. There’s no SEO-breaking client-side rendering by default the way SPAs have. The work is in wiring up canonical URLs, schema, and meta tags, which Astro doesn’t auto-generate.
How do I add a sitemap to an Astro site?
Install @astrojs/sitemap, add sitemap() to integrations in astro.config.mjs, set the site URL in the same config, and reference sitemap-index.xml in your robots.txt. Build, and Astro emits a sitemap automatically.
How do I add canonical URLs in Astro?
There’s no built-in helper, but it’s a single <link rel="canonical" href={...} /> in your layout’s <head>. Compute the canonical from Astro.url.pathname plus your site URL. Use the canonical URL on paginated, tagged, and category pages too, so duplicates collapse to one signal.
Does Astro support JSON-LD structured data?
Yes. Astro doesn’t generate it for you, but you can write any schema.org type as a JSON object and inject it via <script type="application/ld+json"> inside <Fragment slot="head">. Put the schema generators in a single helper file (src/helpers/schema.ts) and import them from any page or layout.
Can I use Pagefind with Astro hybrid mode?
Yes, as long as the pages you want indexed are statically rendered. Pagefind reads dist/ after the build, so any prerendered HTML gets indexed. SSR-only routes (without export const prerender = true) don’t end up in dist/ and won’t be searchable. For a typical content site, this is the default and you don’t need to think about it.
What’s the difference between llms.txt and robots.txt?
robots.txt tells crawlers what they can and can’t access. llms.txt tells AI models what your site is about and where the canonical content lives. They serve different purposes and live alongside each other at the root of your domain.
What’s the difference between noindex and Disallow in robots.txt?
Disallow in robots.txt blocks crawling. The crawler never visits the URL, so it never sees the page’s <meta name="robots" content="noindex"> either. If a page is already indexed and you want it out, use noindex and let the crawler in. Disallow after the fact strands the page in the index.
How do I 301 redirect old URLs in Astro?
Two options. On Netlify, add a line to public/_redirects: /old /new 301. Cross-host, use Astro’s built-in redirects config in astro.config.mjs (redirects: { '/old': '/new' }). Both ship as proper 301 responses.
What to actually do today
Two things, then go.
Open your robots.txt. If PerplexityBot isn’t in there, it’s already 2026 outside your terminal.
Then paste your homepage URL into the Google Rich Results Test. If “no items detected” comes back, you’re invisible in rich results entirely. Start at the top of this list.
If you want more like this, I write at the blog and host the Señors @ Scale podcast.
References
- Schema.org
- Schema.org Article type
- Schema.org Person type
- Schema.org Organization type
- Schema.org BreadcrumbList
- Schema.org HowTo
- Schema.org SpeakableSpecification
- Schema.org SearchAction
- Schema.org PodcastSeries
- Schema.org PodcastEpisode
- Google Article structured data guide
- Google canonicalization docs
- Google rel=prev/next deprecation announcement
- Google passage indexing (BERT) announcement
- Google Rich Results Test
- Google Search Console
- Schema.org Validator
- Bing Webmaster Tools
- Web.dev Core Web Vitals
- llms.txt specification
- Pagefind
- IndexNow
- Open Graph protocol
- Astro sitemap integration
- Astro images guide
- Astro content collections
- Astro redirects config
- Astro syntax: Fragments and slots
- Netlify _redirects file format
- Zod (TypeScript-first schema validation)
- Vitest
Discover more from The Neciu Dan Newsletter
A weekly column on Tech & Education, startup building and occasional hot takes.
Over 1,000 subscribers