Two days ago we announced that WebContainers are now compatible with Firefox. In fact, we soft-launched the alpha version a few weeks ago! We’ve been working hard for the past months to make this a reality, and we can honestly say that has been both trying and rewarding. Seeing a full Node runtime run on a new browser was well worth the effort, but it’s had us toiling down some pretty deep rabbit holes.
You see, WebContainers are not your run-of-the-mill web application. We don’t render any UI, nor do we have any CSS code. We don’t run into the “usual” cross-browser problems with rendering differences. Instead, we make heavy use of message passing, WebAssembly and atomics. We need to pay attention to the precise wording of Web specs and Node’s documentation, so we get the behavior right.
One big roadblock for Firefox support was cross-origin isolation, and we’ve already talked at length about the challenges that entails. Here we’re going to discuss everything else. Everything you need to know if you ever want to port a JS runtime to run on top of a different one.
The elephant in the room
We’ve had an open ticket in our internal tracker for a long time with a pretty dry description:
Error.prepareStackTrace
To provide more context, the V8 JavaScript runtime has a custom stack traces API that gives you some pretty powerful introspection capabilities. For instance:
- It allows you to generate a stack trace omitting specific callers.
- It allows you to inspect the stack as structured data.
- It allows you to use said structured data to format the stack trace in any fashion.
The stack traces API is naturally part of Node’s API, at least implicitly, and it’s used widely both in the Node ecosystem and by Node itself in its internal code. When WebContainers were initially developed, we were fully aware that we were acquiring some “technical debt” of sorts: by focusing first on Chrome, also based on V8, we got support for this API for free. On the other hand, the aforementioned ticket 🙈
The solution for this is neither easy nor complete for a variety of reasons. For starters, the stack
field in the Error
instances is highly vendor-dependent (it’s not even part of the JS spec!). We’ve developed a series of polyfills for these APIs that partially fill the gap, making use of whatever information we can extract from Firefox’s stack traces.
On this note: we understand that there might be good reasons why Error
introspection is not part of the standard. It exposes lots of details about the engine, including possible optimizations. However, any progress in this direction would be very beneficial for advanced use cases like ours, so we’ll pay close attention to whatever comes up in TC39.
Sometimes it’s just a bug
One good thing about the complexity of WebContainers is that they end up being quite the stress test for browser engines. In the course of porting it to Firefox, we’ve uncovered a variety of bugs that, apparently, nobody had noticed before![1]
All of them are related to MessageChanel
and/or Atomics
. Luckily for us, they were easy to work around, so their impact is actually quite minimal. However, they make for a few fun hours of debugging weird race conditions, all to find out that it was not “our fault”.
In any case, we dutifully reported them and got great responses from the Firefox team. The details of each bug are sometimes very involved (as our setting is!) but here they are, for your own amusement:
- Sending a message through a port and blocking the thread might never deliver the message (#1752287).
postMessage
might crash if the payload is too large (#1754400).- A port might receive message in a thread even when it’s already been transferred to another one (#1756975).
Don’t blame the player!
To be fair, we uncovered a few mistakes of ours when porting to Firefox as well! At the end of those debugging sessions, sometimes there was a particular race condition that, for whatever reason, never manifested on Chrome but was there anyway. So porting to Firefox helped us fix a few bugs. That’s what multiple independent implementations are for! 🎉
In addition to our own mistakes, we sometimes encountered problems with other implementations when compared to Firefox. For instance, we found a bug in Node itself, where a MessageChannel
does send a message through a closed port (#42296). This is a tricky one because the semantics of MessageChannel
and worker_threads
in Node don’t necessarily align with Web Workers,[2] but in this case it does seem it was an unintentional deviation. Kudos for the quick fix!
Another example is the User-Agent
header: Firefox does allow you to set this header in fetch()
calls, an addition to the spec that is 7 years old (!), while Chrome does not yet allow for that. This matters to us because custom user-agents are more likely to trigger CORS errors.
Sometimes it’s nobody’s fault
There’s been a handful of (sad) cases where we couldn’t fix the underlying issue or we just wouldn’t. For instance, we’ve encountered (at least twice!) code in the wild that would sort an array doing something akin to this:
things.sort((aThing, bThing) => {
return someCustomLogic(aThing, bThing) ? -1 : 0;
});
This is wrong! The comparison function we pass to sort
has to be consistent, meaning that if we call it with (a, b)
and get -1, calling it with (b, a)
should return +1, which is not what will happen here. This code works in the wild because V8 happens to compare elements in a certain order that makes it work, but this is highly dependent on its internal implementation. Even if you don’t care about cross-engine compatibility, a future optimization might break your code![3]
A similar example are usages in the wild of non-standard date-time strings:
new Date('2022-04-13Z')
The Date
constructor must correctly recognize (simplified) ISO 8601 date strings (YYYY-MM-DDTHH:mm:ss.sssZ
). But each implementation might choose to parse additional date-time string formats. V8 correctly parses the above example, while SpiderMonkey does not. In this case we did choose to polyfill this behavior.
Another thorny issue we are still dealing with is task scheduling. When a browser API defers something to be called “later” (say, messages, promises, timers, etc.) it has some freedom on how to schedule things. We do know that microtasks (e.g. promise callbacks) will get called before tasks (e.g. most of the other stuff that we mentioned before). But when a task is queued, the browser is free to schedule it according to its nature in any way it wants. Quoting from the spec:
Let taskQueue be one of the event loop’s task queues, chosen in an implementation-defined manner […]
This means, for instance, that this code works differently between browsers (the order of the messages is different):
const channel = new MessageChannel();
channel.port1.onmessage = () => console.log('message');
setTimeout(() => console.log('timeout'));
channel.port2.postMessage(null);
which is totally fine, since the “ports event queue” and the timers queue are different. Our event loop implementation is not yet able to work around this issue and some observable consequences are expected. It is in alpha, after all!
Testing, blood, and tears
There might be a certain irony in the fact that the most painful part of this project was not untangling these compatibility issues, but writing tests for them. Because of their delicate nature, WebContainers are mostly tested in an end-to-end fashion, by spawning a real browser and observing its output. We cannot afford mocking the underlying platform: it’s a full-fledged browser!
Anybody that deals with browser end-to-end testing knows what we’re about to say next: it’s 2022 and there’s no standardized and dev-friendly way to write cross-browser tests. WebDriver is kind of outdated/old-school. The Chrome DevTools Protocol (the technology that supports Puppeteer) allows for more flexibility but, as its name gives away, it is Chrome-only. Actually, we currently use Puppeteer, which works on Firefox thanks to the herculean efforts of the Mozilla team. But their support for this protocol is incomplete and its interaction with cross-site isolation makes it a tad fragile.
We’ve heard the chatter about WebDriver BiDi, the next-generation standard that is supposed to supersede the current status quo. We eagerly look forward to any progress in this area 🤞🤞
We need your feedback
We’re very excited to finally unveil this effort, and we can’t wait for you all to try it out. As we’ve detailed at length, we expect it to have a few rough edges. You can check our documentation about browser support, and we’ll appreciate any bug reports sent our way.
This work has significantly improved our understanding of what we can achieve and what we need for better cross-browser compatibility. We are now in a better position to start subsequent efforts, such as supporting mobile browsers.[4] We also have a better grasp on some of the fundamental pieces that WebContainers need to function, so we’ll keep working to bring this technology everywhere we can.
- [1] To be fair, some of those bugs were already reported. But Gecko is a pretty large piece of software, and we weren’t able to relate those existing reports with the symptoms we were observing.
- [2] Note that the
worker_threads
implementation of Node does not share code whatsoever with Chrome’sWorker
implementation - as is the case with any other browser for that matter. - [3] As far as we know, V8 does not provide any extra guarantees about its sorting implementation beyond what is in the EcmaScript spec.
- [4] Though, one final surprise: Firefox mobile works out of the box!