Chasing Memory Bugs through V8 and WebAssembly

Since the introduction of WebContainer a few months ago, we have witnessed with great delight how the excitement about this technology has grown, and how countless projects have pushed its boundaries. We are really happy with the results, and we will be sharing a few details of how it works in the coming months. We can say one thing right out of the bat: WebAssembly and Rust are a crucial part of it.

WebAssembly is key to enable critical, CPU-bound tasks on the web, which abound in our OS-like layers. There is simply no way we could aspire to rival the performance of native toolchains if it did not exist. Concerning Rust, much has already been said about how it is the best fit for targeting WebAssembly, and we wholeheartedly agree with this assessment. The tooling and documentation are a bit rough for the uninitiated, but exposing your Rust code to the web through wasm-bindgen does feel magical.

However, we are starting to run into the limitations of the current state of WebAssembly, if only because there is so much we would like to use it for. We are bullish about performance, so we are avidly waiting for a solution to dynamic memories or “mmap for WebAssembly”. We rely heavily on WebAssembly threads, which still need some polishing (for instance, around TLS and destructors). And being able to seamlessly combine WebAssembly modules coming from different sources to work together is still very much a work in progress.

This lack of seamless multi-module linking had some interesting consequences for WebContainer. At some point, we started shipping multiple WebAssembly payloads, which uncovered a very interesting bug in V8.

Too Much Memory for my Taste

Here is another key detail of how WebContainer works: it spawns quite a few Web Workers. In fact, as many as you want, since they roughly map to OS processes. Each one of these workers gets its own instantiation of our WebAssembly modules, which in turn, triggered the following, quite cryptic error message in our CI:

[Error RangeError: WebAssembly.instantiate(): Out of memory: wasm memory]

This was a bit surprising, since we have relatively beefy machines in CI. And at the end of the day, we are not doing anything suspicious of consuming tons of memory. In fact, we went ahead and checked! Using Chrome’s memory profiling tools you can create a snapshot of the heap of a worker and inspect it. We got something similar to this:

|   Memory x 2          |    Retained Size   |
|-----------------------|--------------------|
|     Memory @83891     |    1114244         |
|     Memory @83597     |    65668           |

This is telling us that each worker consumed around ~1MiB of memory for all its WebAssembly Memory instances. If we are profiling the memory usage of WebAssembly modules, Memory instances should be our first suspect.[1] They are basically large, ArrayBuffer-like blobs that WebAssembly modules make use of in their computations. So, if our failing CI tests are spawning around, say, ~10-20 workers, how come we get “out of memory” with a meager total of 10-20 MiB of consumed memory? Something is afoot!

At this point, we went pretty deep down the rabbit hole. What about Node.js? It turns out you can trigger this error quite easily in, say, Node 14. This time we wrote the simplest possible piece of WebAssembly code we could come up with:

(module
  (func $add (param $lhs i32) (param $rhs i32) (result i32)
    get_local $lhs
    get_local $rhs
    i32.add)
  (export "add" (func $add))
  (memory 1)
  (export "memory" (memory 0))
)

You can compile this in WebAssembly Studio or using wat2wasm, which outputs a binary blob that, once plugged into some Node code, reproduced the bug perfectly:

const code = Buffer.from([
  0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x07,
  0x01, 0x60, 0x02, 0x7f, 0x7f, 0x01, 0x7f, 0x03, 0x02, 0x01,
  0x00, 0x05, 0x03, 0x01, 0x00, 0x01, 0x07, 0x10, 0x02, 0x03,
  0x61, 0x64, 0x64, 0x00, 0x00, 0x06, 0x6d, 0x65, 0x6d, 0x6f,
  0x72, 0x79, 0x02, 0x00, 0x0a, 0x09, 0x01, 0x07, 0x00, 0x20,
  0x00, 0x20, 0x01, 0x6a, 0x0b
]);

const wasmModule = new WebAssembly.Module(code);

[...Array(110)].map(() => new WebAssembly.Instance(wasmModule));

// Uncaught RangeError: WebAssembly.Instance(): Out of memory: wasm memory

In fact, there are two interesting details here. First, the bug seems to be about Memory instances, not Modules or Instances, as we could reproduce it in a much more stripped down example:[2]

[...Array(110)].map(() => new WebAssembly.Memory({ initial: 1 }));

// Uncaught RangeError: WebAssembly.Memory(): could not allocate memory

This is clearly not about memory usage per se! A Memory({ initial: 1 }) object is basically an ArrayBuffer of 64KiB, we cannot be OOM-ing because of 6 MiB of memory.

The second detail is that there seemed to be a very precise limit:

[...Array(100)].map(() => new WebAssembly.Memory({ initial: 100 }));

// No error is thrown, but the one below would throw:

// [...Array(101)].map(() => new WebAssembly.Memory({ initial: 1 }));

Here we bumped the Memory size hundred-fold, but the error was not raised. But as soon as you try to create one more Memory, regardless of size, we get the exception again. So this seems to be about the number of Memory instances created, not specifically about memory usage.

What else could we do? Well, we had “mess around with V8 source code” in our bucket list (some of us, at least), so this was the perfect chance. Once you go through the slightly convoluted process of building the V8 JS engine, you can run something called d8, a JS REPL akin to Node’s, but devoid of the Node APIs, just pure JavaScript. Suffice to say that yes, we did observe the error again with a very similar reproduction case:

const code = new Uint8Array([
  0x00, 0x61, 0x73, 0x6d, /* same as above */, 0x01, 0x6a, 0x0b
]);

const wasmModule = new WebAssembly.Module(code);

[...Array(101)].map(() => new WebAssembly.Instance(wasmModule));

// (d8):1: RangeError: WebAssembly.Instance(): Out of memory: wasm memory

Whodunnit Time

OK, so what is going on here? Our only hope was to dive into V8 source code and try to figure things out. We are no C++ experts, but we are quite motivated, right? 🙈 A little bit of digging took us to backing-store.cc, where we found this (with some editing):

#if V8_TARGET_ARCH_64_BIT
constexpr bool kUseGuardRegions = true;
#else
// ...
#endif

#if V8_TARGET_ARCH_64_BIT
constexpr uint64_t kFullGuardSize = 10 * kOneGiB;
#endif

#if V8_TARGET_ARCH_64_BIT
constexpr size_t kAddressSpaceLimit = 0x10100000000L;  // 1 TiB + 4 GiB
#else
// ...
#endif

// `has_guard_regions` is true, see `kUseGuardRegions` above
size_t GetReservationSize(bool has_guard_regions, size_t byte_capacity) {
#if V8_TARGET_ARCH_64_BIT
  // (3)
  if (has_guard_regions) return kFullGuardSize;
#else
  // ...
#endif
  // ...
}

// (1) this is called with the result of `GetReservationSize`
bool BackingStore::ReserveAddressSpace(uint64_t num_bytes) {
  uint64_t reservation_limit = kAddressSpaceLimit; // (4)
  uint64_t old_count = reserved_address_space_
    .load(std::memory_order_relaxed);
  while (true) {
    if (old_count > reservation_limit) return false;
    // (2)
    if (reservation_limit - old_count < num_bytes) return false;
    if (reserved_address_space_.compare_exchange_weak(
            // (5)
            old_count, old_count + num_bytes, std::memory_order_acq_rel)) {
      return true;
    }
  }
}

If you are not used to low-level C++ code (like us), this might be a bit mind-bending, but the gist is, going backwards:

  1. ReserveAddressSpace returns false at some point, triggering our dreaded error.
  2. This happens in the reservation_limit - old_count < num_bytes branch.
  3. num_bytes is the result of GetReservationSize: for 64-bit machines, it is hardcoded to 10 GiB (see kFullGuardSize) if guard regions are enabled (they are, see kUseGuardRegions).
  4. reservation_limit is hardcoded to ~10 TiB (see kAddressSpaceLimit).
  5. old_count is just the sum of all the already reserved address space (that is what the compare_exchange_weak is doing, in an atomic way: a simple old_count += num_bytes).

We solved it! Now the 100 memories limit makes a lot of sense. Each time a Memory is created, 10GiB of address space are reserved, from a total pool of 1TiB. As soon as we hit 100 instances, we exhausted that pool.

But Why?

The short version is “because security”. This 10GiB of address space is not actual virtual memory that V8 is requesting to the kernel. It is just an address range they will use to map “virtual” addresses in the given Memory to actual pointers in the underlying machine. By reserving such a large space, they can “sandwich” said pointers between two large guard regions of unusable memory. See this diagram, also in backing-store.cc:

#if V8_TARGET_ARCH_64_BIT
base::AddressRegion GetGuardedRegion(
  void* buffer_start,
  size_t byte_length
) {
  // Guard regions always look like this:
  // |xxx(2GiB)xxx|.......(4GiB)..xxxxx|xxxxxx(4GiB)xxxxxx|
  //              ^ buffer_start
  //                              ^ byte_length
  // ^ negative guard region           ^ positive guard region

  // ...
}
#endif

In this way, they mitigate the impact of a hypothetical bug in their WebAssembly-to-native compiler. A pointer miscalculation will be more likely to hit one of those guard regions and trigger a kernel trap, instead of accessing some data from other parts of the process memory. If you want to know more, here is a nice design doc that goes into more detail.

Happy Ending

When we encountered this limitation, we ended up working around it by… just not instantiating lots of modules 😂 We promptly filed an issue in V8’s tracker and, as of V8 9.6.142 (included in Chrome 96), the 1TiB limit is gone. We can now easily instantiate over 10K memories without a sweat. However, we still give quite a lot of thought before including more independent modules in WebContainer, as there are other costs associated with it e.g. the extra overhead of the bundled Rust runtime.

Lastly, this dicussion has revolved around Chrome and V8, but we did test other browser engines, even if our support for them is a work in progress. Both Firefox and Safari behaved better in this aspect, allowing over 1000 instances to be created and even (in the case of Firefox) returning a somewhat friendlier warning.[3] Kudos!

PS- If you’re interested in solving bleeding edge WASM & browser challenges like this one, we’re hiring!


  • [1] Module and Instance instances also take some memory, but we did not find them to contribute much to the picture. The latter should probably be negligible in this context, if it is mostly just a few pointers to the underlying module, memory and exported functions. Module instances are presumably as big as the compiled, binary code they wrap. However, it’s likely that browsers optimize modules to share the compiled code, so that makes them even less relevant for our investigation.
  • [2] Note that the original example also created an implicit Memory instance. This is triggered by the (memory 1) section of the textual module we showed above.
  • [3] When going over the limit, Firefox prints:
    WebAssembly.Memory failed to reserve a large virtual memory region.
    This may be due to low configured virtual memory limits on this system
    
Roberto Vidal
Engineer at StackBlitz. Talk to me about #rustlang
Explore more from StackBlitz

Subscribe to StackBlitz Updates

A newsletter from StackBlitz covering engineering achievements, product features, and updates from the community.

Using StackBlitz at work?

Create in-browser preview apps for every issue or pull request, share private templates with your org, manage your design system with zero overhead, and more.

Try StackBlitz Teams