---
title: "Why are we not using Service Workers?"
publishDate: 2026-06-07T00:00:00.000Z
excerpt: "I feel like Service Workers are a underused technology with a lot of benefits, but very complex to set up and often misunderstood and what they do. Here are some case studies from Slack, Mux and me on where and how to use Service Workers"
category: "javascript"
tags: ["javascript", "performance", "web-development", "network", "offline-support"]
canonical: https://neciudan.dev/why-are-we-not-using-service-workers
---

Over the past few months at conferences and meetups in Barcelona, Paris, Cluj, London, Coimbra, and Zurich, I surveyed developers from big companies: how does your team use service workers?

I spoke with frontend leads, staff engineers, and app developers who work on apps with millions of users.

The overwhelming answer: we don't use them.

This bothered me because the API has been available in every major browser since 2018, and I personally used it and saw the benefits firsthand.

Big corps like Google, Microsoft, or Canva use Service Workers heavily, but that is because of the nature of their products.

So I want to figure out why small and medium-sized companies aren't using Service Workers, especially when their products need it! 

My best assumption is that they don't understand the benefits, so let's go through what Service Workers are, how they can benefit your company, and how other big companies are using them in production. 

## What a service worker actually is

A service worker is a JavaScript file that the browser runs on a separate thread, outside your page.

You register it once:

```js
if ('serviceWorker' in navigator) {
  navigator.serviceWorker.register('/sw.js');
}
```

From that point on, it sits between your app and the network. Every request your page makes (scripts, styles, API calls, images) can pass through its `fetch` handler, and the handler decides what to respond with: the real network response, a cached copy, or something it fabricated on the spot.

```js
self.addEventListener('fetch', (event) => {
  event.respondWith(
    caches.match(event.request).then((hit) => hit || fetch(event.request))
  );
});
```

Three properties make it different.

- It sits on the network path. 

Nothing leaves your page without going through it first, which means it can rewrite requests, synthesize responses, add headers, or answer from disk without your application code knowing anything happened.

- It outlives your page. 

The browser can wake it up after the tab is closed, which is why push notifications and background sync are only possible through a service worker. No other browser context gets resurrected after the user is gone.

- The worker has no DOM access

It talks to your app through `postMessage,` and through the responses it serves. It also has its own lifecycle, separate from your page's: it installs, waits, activates, and gets terminated whenever the browser feels like it, keeping only what you explicitly persisted in the Cache API or IndexedDB.

Think of it as a proxy with a lifetime longer than your app.

## Use case one: boot performance and offline support

I am going to start with a story from Slack.

In 2019, their web client booted in ~5 seconds for users with one or two workspaces. 

They profiled it and found that the network was the biggest source of both latency and variability; every boot re-fetched assets, and asset fetch times swung wildly depending on connection quality.

They observed that almost nothing in that asset set changes between boots. 

The user who opens Slack on Tuesday morning downloads the same JavaScript they downloaded Monday morning. So they decided to improve it.

On first boot, the client downloads the full asset set (HTML, JavaScript, CSS, fonts, and sounds) and stores it in the service worker's Cache API. In parallel, a copy of the in-memory Redux store gets persisted to IndexedDB.

On the next boot, the client checks for those caches. 

If they exist, it boots entirely from local data: cached HTML, cached bundles, hydrated Redux store. The UI is displayed on screen before a single network request completes, and fresh data is loaded in the background afterward, replacing the cached snapshot. 

Slack calls this a warm boot. A cold boot is the first-ever visit, with nothing cached.

The Slack handler published is almost embarrassingly small (they note the production one carries more app-specific logic, but the shape is the same). 

Their implementation was pretty simple:

```js
self.addEventListener('fetch', (e) => {
  if (assetManifest.includes(e.request.url)) {
    e.respondWith(
      caches
        .open(cacheKey)
        .then((cache) => cache.match(e.request))
        .then((response) => response || fetch(e.request))
    );
  } else {
    e.respondWith(fetch(e.request));
  }
});
```

But the hard part was the versioning, because Slack deploys multiple times a day, and a worker serving cached assets means users boot on assets from a previous deploy.

Their solution has three layers.

A custom webpack plugin generates a manifest of all asset files, each with a content hash, on every deploy. That manifest is embedded into the service worker file itself, so any change to any asset makes the worker byte-different, and a byte-different worker triggers the browser's update flow automatically. 

Cache buckets are keyed by deploy timestamp. An HTML file from deploy `X` only ever loads assets from bucket `X`, whether they come from cache or the network. You can never get `X` HTML to load while `Y` JavaScript is deploying. 

Buckets older than 7 days get deleted on the worker's `activate` event.

Then, a warm boot, by definition, serves assets fetched at the previous worker registration. Slack deploys many times a day, and a typical user boots once each morning, so clients risked running a full day behind permanently. 

Their fix: while the app is open, re-register the service worker on a jittered interval. Re-registration makes the browser check for a byte-different worker, which prefetches fresh assets for the *next* boot. 

This halved the average age of assets at boot time, but there's one more trick that I haven't seen anyone else write about. 

Slack ships features together with matching API changes, so a one-version-behind frontend could desync from the backend. To manage this, the worker caches selected API responses (feature flags, experiment assignments) in the same deploy-keyed bucket as the assets. 

A warm boot gets a frontend and a flag configuration that were deployed together. Potentially stale, but always internally consistent, which matters far more.

The results: roughly 50% faster boots than the legacy client, warm boots about 25% faster than cold ones, and tens of millions of requests per day flowing through millions of installed workers within a month of release.

And offline support came as an immediate advantage. Once your app can start without needing the network, it can also function without it. 

Slack users gained offline reading and the ability to mark items as unread, with sync automatically reestablished on reconnect, delivering a highly requested feature as a natural result of service worker use.

## Use case two: the proxy can rewrite anything

Video streaming. If you've ever watched something online (think YouTube), you've streamed videos, and when your Internet is bad, you get a lower-quality video.

To accomplish this, HLS video streams are described in plain-text manifest files. 

The player downloads a multivariant playlist listing every available rendition of the video (resolutions, codecs), picks one based on measured bandwidth, and starts fetching the segments for that rendition.

Mux had a customer streaming screencasts, and the adaptive bitrate kept doing its job too well: on slow connections, it switched viewers down to 240p. This was not ideal, as writing at 240px was unreadable, and users on slow connections would much rather wait for the video to buffer than see it at 240px.

Mux's own player has built-in rendition filtering, and all fixes they tried didn't work: fork the player, run a server-side proxy that rewrites manifests per customer, or tell the customer no.

Instead, they put a service worker in front of the player with one job: intercept requests for the manifest, edit the text before the player sees it.

```js
const MIN_RESOLUTION = 720;

self.addEventListener('fetch', (event) => {
  const url = new URL(event.request.url);
  if (url.hostname === 'stream.mux.com' && url.pathname.endsWith('.m3u8')) {
    event.respondWith(fetchAndFilterPlaylist(event.request));
  }
});

async function fetchAndFilterPlaylist(request) {
  const response = await fetch(request);
  const text = await response.text();
  return new Response(filterPlaylist(text), { headers: response.headers });
}
```

The `filterPlaylist` function walks the manifest line by line and drops every rendition below 720p, keeping all other HLS tags intact. 

The player receives a playlist where low resolutions simply don't exist, so it cannot pick one.

One detail from their post stuck with me: because edge runtimes like Cloudflare Workers implement the same fetch event API, they deployed the stitching worker to Cloudflare unchanged and got a working URL.

Anything that travels as text over HTTP can be rewritten in flight, by code you control, running on the user's machine.

## My use case: the deploy that breaks every lazy-loaded route

My own service worker story starts with a Vite production incident.

Vite fingerprints every build output with a content hash. Your lazy-loaded route lives in `Settings-a3f8b2.js`, and after the next deploy, it lives in `Settings-c91d44.js`, while the old file is gone from the CDN.

Now, picture a user who opened the app before the deploy. Their `index.html` and main bundle still reference the old hashes. 

They work through their morning, and at some point, they click on Settings for the first time that session. The browser requests `Settings-a3f8b2.js`, the CDN returns 404, and the dynamic import throws an Error (you might see it in Sentry as `Failed to fetch dynamically imported module`).

Error screen. For a user who did nothing wrong except keep a tab open over lunch.

Our first fix was the obvious one. Vite emits an event when a preload fails, so we caught it and reloaded:

```ts
window.addEventListener('vite:preloadError', () => {
  window.location.reload();
});
```

This worked, but the UX was terrible. Who wants to experience refreshes when navigating to a page?

Plus, if the user's `index.html` was cached anywhere along the way (browser cache, a CDN edge that hadn't purged yet, a misconfigured `Cache-Control` header, take your pick), the reload fetched the same stale HTML, which referenced the same dead chunk, which threw the same error, which triggered the same reload.

This can cause an infinite refresh loop. 

The second fix was a guard, so we'd only force one reload per session:

```ts
window.addEventListener('vite:preloadError', () => {
  if (sessionStorage.getItem('chunk-reloaded')) return;
  sessionStorage.setItem('chunk-reloaded', '1');
  window.location.reload();
});
```

The loop was gone, but everything else about it was still bad. 

We were treating the symptom. The problem underneath: the client had no idea a new version existed until something exploded, and old chunks evaporated the moment we deployed.

It took us an embarrassing amount of time to see that these are two problems, not one.

Problem one: users on the old version need the old chunks to keep existing for the duration of their session.

Problem two: users should migrate to the new version soon after it ships, without a hash failure being the messenger.

A service worker solves both, because it's the only place in the browser that can keep dead files alive and the only background process that can watch for new versions.

Each deploy now writes a `version.json` next to the bundle, generated by a small Vite plugin reading the build manifest:

```json
{
  "version": "2026.06.04-1412",
  "assets": ["/assets/index-c91d44.js", "/assets/Settings-c91d44.js"]
}
```

The version is a build timestamp rather than a hash, mostly for debugging. 

The worker compares versions with a plain inequality check rather than "newer than," so a rollback to an older build triggers an update like any other deploy.

The service worker polls that file. Polling a 200-byte JSON with `cache: 'no-store'` is cheap, and the worker is the right place for it because it's already sitting on the network path:

```js
let currentVersion = null;

async function checkVersion() {
  const res = await fetch('/version.json', { cache: 'no-store' });
  const { version, assets } = await res.json();

  if (version === currentVersion) return;

  // Precache the entire new build BEFORE telling anyone about it
  const cache = await caches.open(`app-${version}`);
  await cache.addAll(assets);
  currentVersion = version;

  const clients = await self.clients.matchAll();
  for (const client of clients) {
    client.postMessage({ type: 'NEW_VERSION', version });
  }
}

self.addEventListener('message', (event) => {
  if (event.data.type === 'CHECK_VERSION') checkVersion();
});
```

Service workers are terminated by the browser when they're idle, so the worker can't reliably run its own `setInterval`. 

The page drives the polling instead, posting `CHECK_VERSION` on an interval and on `visibilitychange`, so a tab that comes back from a weekend in the background checks immediately.

The fetch handler is the part that fixes problem one. 

Hashed assets are served cache-first, and old cache buckets stay alive until no client needs them:

```js
self.addEventListener('fetch', (event) => {
  const url = new URL(event.request.url);
  if (url.pathname.startsWith('/assets/')) {
    event.respondWith(
      caches.match(event.request).then((hit) => hit || fetch(event.request))
    );
  }
});
```

`caches.match` with no cache name searches every bucket, old and new. So a user mid-session who clicks Settings gets `Settings-a3f8b2.js` from the worker's cache even though the CDN deleted it minutes ago. 

The 404 that started this whole story can no longer happen, because the worker holds the only surviving copy of the file and serves it without asking the network.

Problem two is the app's side. It listens for the message and updates in the background:

```ts
let updateReady = false;

navigator.serviceWorker.addEventListener('message', (event) => {
  if (event.data.type === 'NEW_VERSION') updateReady = true;
});

// called on every route navigation
function onNavigate() {
  if (updateReady) window.location.reload();
}
```

The reload happens on a route change, when the user is already expecting the screen to swap and there's no half-filled form to lose. 

It pulls the new `index.html`, which points at assets the worker has already cached, so the "reload" is served almost entirely from disk.

One requirement for the full effect: `index.html` itself must be served with `Cache-Control: no-cache`. The Vite docs say the same thing about the plain reload approach, and our infinite loop earlier was the price of ignoring it. 

Even when some CDN edge hands out stale HTML, the user lands back on the old version for one more cycle, and nothing breaks because the old chunks are still sitting in the worker's cache. 

The `vite:preloadError` handler is still there as a last resort, and it hasn't fired since.

If this sounds familiar, it should. It's Slack's deploy-keyed cache buckets wearing different clothes (that's where I got the idea from). 

Old assets must keep working for old sessions; new sessions must get new assets; clients should converge on the latest version without anything breaking in between. 

A service worker is the only place in a browser where you can enforce all three rules, because nothing else sits between your app and the network while also persisting across deploys.

## So why is nobody using them?

After the survey responses, I started asking a follow-up question: why not? 

The first answer was that *they dont need them* (Which might be true)

The second answer, from most senior engineers, was that they had trouble with them in the past, and it's not worth the effort. 

**Specifically the lifecycle of a service worker**

By default, a page that registers a service worker isn't controlled by it until the next navigation, and even `clients.claim()` can't intercept the requests the page fired before the worker activated. 

Everyone who has touched a service worker has a story about a worker stuck in `waiting` while they refreshed the page twelve times, or about `skipWaiting` activating a new worker under a page built for the old one.

Even Mux ran into this in their own demo. A video player starts fetching the moment it mounts, before a same-page worker can take control, so they had to register the worker on an index page and link onward to the player page. 

The post mentions a second issue: a registration scope of `/resolution-filtering/` works, while `/resolution-filtering` doesn't, and nothing explains why.

**Everyone knows a cache horror story.**

The two people in my survey who "tried one in 2019 and removed it" both told the same story with different details: a service worker with a bad cache strategy served a stale app to users, and the fix required shipping a killswitch worker and waiting days for clients to pick it up, because the broken worker controlled when updates were checked.

The fear is justified. But you need to invest more into versioning.

Look at the Slack example and see where they spent their effort: the fetch handler is a dozen lines, and the versioning machinery (manifest hashing, deploy-keyed buckets, jittered re-registration, 7-day eviction) is everything else. 

Cache invalidation across deploys is *the* problem, and the teams that got burned are the teams that shipped the dozen lines without the rest.

As the saying goes, there are only 2 things hard things in programming: naming things and cache invalidation.

**The product never asked for offline.**

Most of the people I surveyed build dashboards, internal tools, and B2B apps. 

Nobody writes "works on the metro" into those requirements. Fair enough.

But offline was never the only use case, and most of this article is evidence. My deployment problem had nothing to do with being offline. Mux's manifest rewriting has nothing to do with offline. Slack's 50% boot improvement helps users on gigabit fiber. 

Dismissing service workers because you don't need offline is dismissing a proxy because you don't need one of the things a proxy can do. 

And I think **you should have offline support**. Think of your users first.

**And you probably ARE using them.**

My favorite counterexample is Partytown, the Builder.io library that moves Google Analytics, Tag Manager, and Facebook Pixel off the main thread.

Third-party scripts wreck your Core Web Vitals by competing with your app for the main thread. 

The obvious fix, running them in a web worker, fails for one specific reason. Those scripts constantly read the DOM synchronously (`document.title`, `document.cookie`, `window.location.href`) and expect an immediate return value, while worker-to-page communication is asynchronous.

Partytown's trick starts with a fact about workers: a web worker has exactly two legal ways to block. `Atomics.wait()` on a SharedArrayBuffer, and a *synchronous* XMLHttpRequest, the API we all spent a decade learning to avoid. 

When the analytics script (running in the worker, against a proxied DOM) needs a real value, Partytown serializes the request and fires a sync XHR at a fake URL ending in `proxytown`. The worker thread blocks, waiting for the response.

The production source includes my favorite comment in any open-source codebase:

```ts
const xhr = new XMLHttpRequest();
xhr.open('POST', partytownLibUrl('proxytown'), false);
xhr.send(JSON.stringify(accessReq));
// look ma, I'm synchronous (•‿•)
return JSON.parse(xhr.responseText);
```

That request never reaches any network. 

A service worker intercepts it and messages the correct tab's main thread (since the worker is shared across all tabs of the origin, requests carry a tab ID, pending requests live in a correlation map, and a timeout protects against a dead tab hanging everything). 

If your site runs Partytown, you have a service worker in production bridging threads through fake HTTP, and you probably never thought about it.

Another library that relies heavily on Service Workers is Mock Service Worker.

Which is heavily used in the JS Ecosystem for testing: instead of making a real request, you use MSW to intercept it and send back a mocked JSON response. (I think in 2020, everybody was building e2e tests this way. I know we did this at Glovo)

## It's hard to write Service Workers

Hopefully, by now, you understand the benefits of Service Workers and where to use them: 
- offline support
- deployment asset strategy 
- performance improvements

But you might still be overwhelmed by the documentation and boilerplate surrounding Service Workers (which is why this is not a how-to guide). 

There are lots of complex interactions that are hard to get right when building Service Workers. 
- Network requests 
- Caching strategies
- Cache management
- Precaching

That's where [Workbox](https://web.dev/learn/pwa/workbox) from Google comes in. 

Workbox is a set of modules that simplify common service worker routing and caching. Each module available addresses a specific aspect of Service Worker development, making it easier to create, manage, and work with them. 

The important thing to understand is what they do and where they can help you now or in the future. 

Good luck. 

## References

- [Service Workers at Slack: Our Quest for Faster Boot Times and Offline Support](https://slack.engineering/service-workers-at-slack-our-quest-for-faster-boot-times-and-offline-support/) - Slack Engineering
- [Service workers are underrated, and building media proxies proves it](https://www.mux.com/blog/service-workers-are-underrated) - Mux
- [Partytown](https://github.com/QwikDev/partytown) - Builder.io's library for running third-party scripts off the main thread
- [Dexie Cloud: usingServiceWorker](https://dexie.org/docs/cloud/db.cloud.usingServiceWorker) - background sync for offline-first databases
- [Mock Service Worker](https://mswjs.io/docs/) - API mocking at the network boundary
- [Introducing WebContainers](https://blog.stackblitz.com/posts/introducing-webcontainers/) - StackBlitz
- [Workbox](https://web.dev/learn/pwa/workbox) - web.dev
- [HTTP Archive Web Almanac, PWA chapter](https://almanac.httparchive.org/en/2021/pwa) - service worker adoption data
- [Service Worker API](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API) - MDN
- [Vite: Handling preload errors](https://vite.dev/guide/build.html#load-error-handling) - Vite docs
