Bringing Sharp to WebAssembly and WebContainers

TL;DR: You can now use Sharp - a high-performance image processing library for Node.js, in WebContainers!

Try it live on StackBlitz!


WebContainers is an environment that allows you to run Node.js directly in your browser. It can easily handle any JavaScript, including npm modules. However, when it comes to image processing and optimisation, users of toolchains such as Gatsby, Astro, Next.js and others were facing difficulties. The most popular libraries used for this task are @squoosh/lib derived from Squoosh.app - sadly, no longer maintained as a library - and Sharp, which uses a native Node.js addon for the expensive processing under the hood. The native addon is built for Windows, Linux, and macOS, but none can run in the browser, so Sharp couldn’t run in StackBlitz either.

That has slightly changed in the last few years because browsers can now run native code if compiled to WebAssembly. When StackBlitz asked me to work on porting Sharp to WebAssembly, it was a perfect match. Not only because I’ve been working with WebAssembly for the past couple of years but also because I worked on a similar project (Squoosh) before and independently tried porting N-API (now called Node-API) to Emscripten via a project dubbed emnapi, and always wanted to finish it.

Let me walk you through the process and some of the issues I encountered along the way when porting Sharp to WebAssembly.

Porting Node-API to WebAssembly

When I started, rather than looking through my GitHub, I searched for “emnapi” on Google. Imagine my surprise when I found a project with the same name and goals published by someone else. My first concern was that someone might have copied the project, as it sometimes happens in the OSS, but upon closer inspection, it became clear that it’s just one of those ideas that are always in the air and, let’s face it, the name wasn’t that unique.

In fact, Toyo Li had made much more progress and covered many more Node.js APIs in their port, going beyond JavaScript value manipulation and implementing even complex integrations like async tasks and memory synchronisation, not to mention extensive documentation. I decided to go with this newer emnapi and contribute along when necessary.

libvips

For image processing, Sharp uses a library called libvips under the hood. Essentially, Sharp is a high-level wrapper for libvips with a Node.js-friendly API.

In turn, libvips uses GLib, libjpeg, cgif, libimagequant, and many other libraries to support different formats and processing operations. Making sure that all these dependencies compile to WebAssembly, choosing compatible flags, and patching source code where necessary is a lot of work and introduces the most complexity when porting Sharp / libvips to Wasm.

We were fortunate that someone had done most of that work already. Kleis Auke Wolthuizen created wasm-vips, a JavaScript / WebAssembly wrapper for libvips capable of running in a browser. In the process, he patched those dependencies and wrote a build script that downloads and applies patches and builds libvips with correct flags before building wasm-vips itself.

I piggy-backed on that script, added the ability to build just the libvips itself, and included the C++ bindings required by Sharp. Then, I successfully compiled those bindings together with Sharp’s own C++ code into a single WebAssembly module. Throughout the work, I also added support for previously missing formats like AVIF and SVG and some build optimisations.

SVG and text support

For SVG, I replaced librsvg, usually used by libvips on other platforms, with resvg. The main reason was that librsvg has many dependencies and has yet to be ported to WebAssembly. At the same time, resvg is a Rust library, and Rust has a much better cross-compilation story, including for compilation to WebAssembly. Even aside from easier WebAssembly support, resvg is worth checking out for its better SVG compatibility and speed as well.

For now, we decided to disable SVG support in the WebAssembly port by default. The reason? Text is hard.

In native land, resvg reads all the fonts from the system fonts directory, collects parsed metadata, and can then use it to find fonts by the requested name, weight, and other parameters. In WebAssembly, it’s not that easy.

In Node.js or WASI, we could expose the system fonts directory to the module, but what would we do in the browser?

You can render text via DOM or canvas, but that doesn’t give access to the raw font files the library wants. There are CDNs like Google Fonts, but it is pretty expensive to download font files while rendering SVG, especially when you want to read lots of them in advance. WICG Local Font Access API is probably the most promising solution in this area as it provides access to raw system font files, but it’s currently Chrome-only.

In response to my issue, the resvg maintainer has kindly added support for enumerating fonts required by the given SVG file prior to rendering. This should solve the issue of having to download all the fonts in existence in advance just to read their metadata, which is not an option when using a CDN due to the sheer volume of data to download.

We will look closer at supporting text and SVG in the future, but for now, there are too many unanswered questions, and disabling those features altogether seems better than rendering potentially broken content with elements like text quietly missing in the resulting image.

Synchronous startup

One interesting limitation of this project was that for StackBlitz, compatibility is crucial so that users don’t have to change their Node.js code that already uses Sharp to make it work in WebContainers. This meant that, where Sharp loads and instantiates native modules synchronously with a simple require, WebAssembly needs to initialise synchronously too.

This is generally discouraged, and, in fact, Chrome will outright refuse to compile modules larger than 4KB on the main thread, although that limit is getting changed to a higher one at the moment of writing. Luckily, WebContainers already run user code in Workers to allow this kind of long-blocking operation without blocking the UI. So all we need is to override Emscripten’s default behaviour with the synchronous one via the -s WASM_ASYNC_COMPILATION=0 flag.

Next, Sharp itself, or libvips, uses a GLib thread pool to split up and manage image processing tasks. WebAssembly supports threads that use Web Workers + shared memory + atomic operations under the hood. One interesting obscure Web Workers quirk is that they don’t spawn synchronously but rather schedule a task to spawn a new Worker on the next event loop tick. This behaviour is not visible to most JavaScript users but makes Workers hard to use from WebAssembly. I talked about this more in the linked article, but let me briefly talk through the relevant part here as well:

pthread_create(&thread_id, NULL, thread_callback, &arg);
pthread_join(thread_id, NULL);

If you were to translate this C to a JS pseudo-code, it would look roughly like this:

let isReady = false;
let worker = new Worker(...);

// worker sends a message once it’s initialised
worker.onmessage = msg => {
  if (msg.type  === 'ready') {
    isReady = true;
  }
};

while (!isReady) {}

The new Worker(...) will only create a binding for a Worker but will wait till the end of the current browser loop cycle to actually spawn it, and only then would the worker be able to post a “ready” message. However, we blocked the browser event loop with our own while (!isReady) {} loop that waits for a response from the worker. This is a classic example of a deadlock.

To work around this limitation, Emscripten has a setting to pre-initialise its own thread pool (-s PTHREAD_POOL_SIZE=...). When used, Emscripten will create and asynchronously wait for all Workers at startup, and all the subsequent pthread_create operations won’t have to wait on the event loop. Instead, they can share data via the WebAssembly shared memory.

In our case, the startup must be fully synchronous, so we can’t use this option either. We had to find a way to avoid using the pool altogether.

Turns out, one of the less known but notable differences between Web Worker API in the browser and the worker_threads Worker API in Node.js is that the latter does exactly what we want: new worker_threads.Worker(...) spawns a worker immediately, which allows us to block the current thread’s event loop. And WebContainer implements even such obscure differences in a Node.js compliant way, too!

The reason Emscripten couldn’t leverage it is because the flow went like this:

  1. The main thread creates a Worker via new Worker and subscribes to its messages.
  2. The main thread sends a message “load” with the Wasm module and some other bits to the Worker.
  3. The Worker receives the “load” message with the Wasm module, loads relevant JS files, and asynchronously initialises the runtime.
  4. The Worker initialization is done. It then sends a message “loaded” to the main thread.
  5. The main thread receives a message “loaded”.
  6. The main thread sends a message “run” to the Worker with a pointer to pthread callback and other relevant info.
  7. The worker receives the message “run” and executes the pthread.

Node.js can do steps 1-4 synchronously, but receiving a message on step 5 requires asynchronous waiting for the event loop since messages are received as regular events. And, as we mentioned earlier, we can’t afford any asynchronous action since the startup must be fully synchronous.

But what if… we didn’t wait for the worker initialization at all? worker.postMessage doesn’t send messages immediately but instead adds them to an internal queue. It’s designed that way to ensure that no messages get lost and that users don’t get an error if they send a message before Worker is ready to accept it.

In Node.js, this means that we can spawn a new Worker, send “load” and “run” commands, and block (for instance via pthread_join) waiting on a condition in the WebAssembly shared memory, all in the same event loop tick, without deadlocking or waiting on any asynchronous events.

The new flow looks like this:

  1. The main thread creates a Worker via new Worker and subscribes to its messages.
  2. The main thread sends a message “load” with the Wasm module and some other bits to the Worker.
  3. The main thread sends a message “run” to the Worker with a pointer to pthread callback and other relevant info.
  4. The Worker receives the “load” message with the Wasm module, loads relevant JS files, and asynchronously initialises the runtime.
  5. The Worker stores all the other incoming messages into a queue (in this example, it’s just a message “run”).
  6. Worker initialization is done. It sends a message “loaded” to the main thread.
  7. The Worker executes all the queued-up messages (in this example, a message “run” so it executes the pthread).

I implemented this in an upstream Emscripten PR, so starting from version 3.1.29, you can use PThreads in Node.js without a Worker pool altogether or spawn more threads than available in the pool without deadlocking. Combined with -s WASM_ASYNC_COMPILATION=0, the startup is now fully synchronous.

I/O references

Another interesting problem I encountered while running tests was the Node.js process was just… hanging around after the tests finished. Sure enough, pressing Ctrl+C would kill it. Still, it provides a poor experience, and we couldn’t ask all the consumers of the sharp library to do the same in whatever executables it would be integrated into.

Ironically, that was an issue I ran into and raised a couple of years before and never found a good solution for.

Node.js has a variety of I/O handle objects - which include Workers. All such handles have methods for explicit reference control: .ref() to mark it as strongly referenced and .unref() to mark it as weakly referenced. Node.js will only exit once all strongly referenced handles are either unreferenced or garbage collected. This is how Node.js servers stay alive indefinitely or how CLIs don’t accidentally exit while waiting for user input or a response from a fetch call.

Since a Worker is just another strongly referenced handle, Node.js errs on the side of caution and keeps the main process alive while the Worker is still executing. For example, creating a worker with an infinite while (true); loop will keep the main process alive forever, even though the blocking code runs in a background thread. The only ways to stop it are to either forcefully .terminate() the Worker or at least .unref() it to mark it as weakly referenced.

Between those two, .unref() is the more graceful solution. However, you need to know when to invoke it: if you unref the Worker too late, the application appears blocked and doesn’t exit, and if you unref too early, you will not get important onmessage events from the Worker since the application has already exited and the async flow will be broken:

const { Worker } = require("worker_threads");

let worker = new Worker('postMessage("ready");', { eval: true });

worker.onmessage = (event) => {
  // never reached
  console.log("Worker initialised, now let's do some actual work");
};

worker.unref();

Multithreaded Emscripten applications commonly worked around this problem by using -s EXIT_RUNTIME setting, which forcefully exits the application when the main C function finishes executing. That is, it calls process.exit(0) which kills the Node.js application together with any spawned workers. This works for executables but isn’t an option for libraries because they don’t have a main entry point but rather a list of individual exports, and even if they did, we wouldn’t want to kill the whole application after an arbitrary library call.

For a while, I thought it was impossible to solve this automatically either on the Node.js side or the Emscripten side. But in a chat about this issue with Dominic Elm , he came up with - in retrospect - an obvious solution: why don’t we do a .ref / .unref “dance” so that each time we send some actual work (PThread function) to the Worker, it gets strongly referenced, and once we know it finished execution and is sitting as an idle Worker in Emscripten’s pool, we mark it as weakly referenced again. The code ended up being a lot simpler than finding the relevant tests and writing the accompanying PR explanation, and it worked perfectly for common scenarios!

There were a few more custom tweaks necessary for libvips and emnapi’s usage of PThreads. However, those are out of the scope of this article, especially because some of them are no longer relevant thanks to upstream changes in both Emscripten and emnapi .

Together with those tweaks, the startup was now fully synchronous, and tests exited as image processing was done and not sooner, which made the module fully compatible with the native addon API-wise.

Results so far

The benchmark results for the WebAssembly version look very promising (all executed with concurrency set to 2, since that’s what we have set in the WebContainers environment, and with Turbofan only to reduce startup overhead):

The most significant difference is for codecs and operations that rely on SIMD. While WebAssembly has SIMD support , authors must either use intrinsics to leverage Emscripten’s portability layer or write WebAssembly instructions by hand in a separate assembly file, just like they would for other architectures. We are cross-compiling SIMD support for libraries that use SIMD intrinsics, but, unfortunately, some others rely on raw assembly and have to be compiled with the slower implementation for now.

All in all, this was a pretty exciting project to work on. While some features are still missing, it will unlock new use cases and unblock many users on StackBlitz.com and beyond that depend on image processing or optimisations. Feel free to try it out in WebContainers below and share feedback with us or on the PR if you run into any issues!


Ingvar Stepanyan Guest Author
WebAssembly contractor, obsessed D2D engineer (parsers, compilers, tools & specs).
Explore more from StackBlitz

Subscribe to StackBlitz Updates

A newsletter from StackBlitz covering engineering achievements, product features, and updates from the community.

Using StackBlitz at work?

Create in-browser preview apps for every issue or pull request, share private templates with your org, manage your design system with zero overhead, and more.

Try StackBlitz Teams