Raw Wasm:
Hand-crafted WebAssembly Demos

WebAssembly Summit 2021

Ben Smith @binjimint
Over the past ~3 years, I've made the following WebAssembly demos:
Doomfire - May 2019 - (398 bytes)
Metaball - May 2019 - (452 bytes)
Raytrace - May 2019 - (1486 bytes)
Snake - June 2019 - (1976 bytes)
Maze - October 2019 - (2047 bytes)
Chip 8 - September 2020 - (1020 bytes)
Dino - December 2020 - (2020 bytes)
Inflate - January 2021 - (991 bytes)

How?

  • vim
  • python scripts
  • wat2wasm (tool in wabt)
  • browser

That's it :-)

My rules:

  • Use as little JS as possible
  • Make it as small as possible
  • Try something new and have fun!
What we're making this time:

How our match 3 works:

  • 8x8 grid of 8 different emojis πŸ˜€πŸ€©πŸ₯΅πŸ₯ΆπŸ€ πŸ˜±πŸ˜ˆπŸ’©
  • 3 adjacent, identical emojis in a row or column are removed
  • Emojis fall down to fill in the holes
  • You can swap two emojis horizontally or vertically
  • If there are no swaps left, the game is over

Let's dive into the code!

Step 1: HTML, JS

    body {
      position: absolute;
      display: flex;
      flex-direction: column;
      background-color: #fff;
      margin: 0;
      width: 100%;
      height: 100%;

    
Body is a flexbox That fills the screen

    canvas {
      object-fit: contain;
      width: 100%;
      height: 100%;
      image-rendering: pixelated;
      image-rendering: crisp-edges;
    }
    
Maintain aspect ratio And keep pixels pixelated!
The canvas element itself is 150x150 pixels,
but will be stretched to fill the available space (preserving aspect ratio).
Step 2: Loading Wasm

    const w = 150, h = 150;

    (async function start() {
      const response = await fetch('match3.wasm');
      const moduleBytes = await response.arrayBuffer();
      const {module, instance} =
        await WebAssembly.instantiate(moduleBytes);
      const exports = instance.exports;
      const buffer = exports.mem.buffer;
      const canvasData = new Uint8Array(buffer, 0x10000, w*h*4);

      // ...
    
Fetch the Wasm module And instantiate it Extract the exports, one of which is a WebAssembly.Memory object Create a 150*150*4 byte view of Wasm memory to use for canvas data

      // ...
      const canvas = document.querySelector('canvas');
      const context = canvas.getContext('2d');
      const imageData = context.createImageData(w, h);

      (function update() {
        requestAnimationFrame(update);
        exports.run();
        imageData.data.set(canvasData);
        context.putImageData(imageData, 0, 0);
      })();
    })();
    
Create a CanvasRenderingContext2D Create an ImageData object to blit into the Canvas Create an update function that is called 60 FPS* by requestAnimationFrame Run the per-frame Wasm function Copy the Wasm memory to the ImageData, and draw it to the Canvas

    ;; Memory map:
    ;;
    ;; [0x10000 .. 0x25f90)  150x150xRGBA data (4 bytes/pixel)
    (memory (export "mem") 3)

    (func (export "run")
    )
    
Add a comment to the top, describing the memory layout Create and export a Memory object of 64Ki * 3 = 192Ki bytes Create and export the per-frame function
Step 3: Clearing the Screen

  (func $clear-screen (param $color i32)
    (local $i i32)
    (loop $loop
      ;; mem[0x10000 + i] = color
      (i32.store offset=0x10000
        (local.get $i) (local.get $color))
      ;; i += 4
      (local.set $i
        (i32.add (local.get $i) (i32.const 4)))
      ;; loop if i < 90000
      (br_if $loop
        (i32.lt_s (local.get $i) (i32.const 90000)))
    )
  )
  
Define function called $clear-screen, with one i32 parameter Define an i32 local variable Loop over all pixels, from 0 to 150*150*4 = 90000 Write color to memory at (0x10000 + i) Increment address by 4

  (func (export "run")
    (call $clear-screen
      (i32.const 0xff_00_00_ff))  ;; ABGR format
  )
  
Clear the screen to red
Step 4: Drawing a Pixel

  (func $put-pixel
    (param $x i32) (param $y i32) (param $color i32)
    ;; mem[0x10000 + (y * 150 + x) * 4] = color
    (i32.store offset=0x10000
      (i32.mul
        (i32.add
          (i32.mul (local.get $y) (i32.const 150))
          (local.get $x))
        (i32.const 4))
      (local.get $color))
  )
  
Define $put-pixel function with 3 i32 params Multiply $y by 150 Add $x Multiply the result by 4 (the size of each pixel) Write $color to this address (with a 0x10000 offset)

  (func (export "run")
    (call $put-pixel
      (i32.const 100) (i32.const 100)
      (i32.const 0xff_00_00_ff))
  )
  
Draw a red pixel at (100,100)
Step 5: Mouse Input

      const input = new Uint8Array(exports.mem.buffer, 0x0000, 3);

      function mouseEventHandler(event) {
        // ...
        input[0] = event.offsetX;
        input[1] = event.offsetY;
        input[2] = event.buttons;
      }

      canvas.addEventListener('mousemove', mouseEventHandler);
      canvas.addEventListener('mousedown', mouseEventHandler);
      canvas.addEventListener('mouseup', mouseEventHandler);
  
Create a Uint8Array view of the first 3 bytes Write the mouse info into these bytes Add this handler for mousemove, mousedown and mouseup events

  ;; [0x0..0x0)  X mouse position
  ;; [0x1..0x1)  Y mouse position
  ;; [0x2..0x2)  mouse buttons
  (func (export "run")
    // ...
    (call $put-pixel
      (i32.load8_u (i32.const 0))    ;; X
      (i32.load8_u (i32.const 1))    ;; Y
      (select
        (i32.const 0xff_00_00_ff)    ;; Red
        (i32.const 0xff_ff_00_00)    ;; Blue
        (i32.load8_u (i32.const 2))) ;; Buttons
    )
  )
  
Update the memory map Read the x, y, and buttons with i32.load8_u select either red if a button is pressed, blue otherwise
Step 6: Filling a Rectangle

  function fillRect(x, y, w, h, color) {
    var i, j;
    for (j = 0; j < h; j++) {
      for (i = 0; i < w; i++) {
        putPixel(x + i, y + j, color);
      }
    }
  }
  
How you might write fillRect in JavaScript

  (func $fill-rect (param $x i32) (param $y i32)
                   (param $w i32) (param $h i32)
                   (param $color i32)
    (local $i i32) (local $j i32)
    (loop $y
      (local.set $i (i32.const 0))
      (loop $x
        (call $put-pixel
          (i32.add (local.get $x) (local.get $i))
          (i32.add (local.get $y) (local.get $j))
          (local.get $color))
        (local.set $i (i32.add (local.get $i) (i32.const 1)))
        (br_if $x (i32.lt_s (local.get $i) (local.get $w))))
      (local.set $j (i32.add (local.get $j) (i32.const 1)))
      (br_if $y (i32.lt_s (local.get $j) (local.get $h)))))
  
Define the $fill-rect function, with 5 i32 parameters Initialize locals i and j to 0 Loop j from 0 to h Loop i from 0 to w Put a pixel at (x+i,y+j)
Step 7: Drawing a Sprite

  (func $draw-sprite (param $x i32) (param $y i32)
                     (param $w i32) (param $h i32)
                     (param $src i32)
    ;; ...
  )
  
Start with $fill-rect, but change the $color parameter to $src

  ;; ...
      ;; put-pixel(x + i, y + j, mem[src + (w * j + i) * 4])
      (call $put-pixel
        (i32.add (local.get $x) (local.get $i))
        (i32.add (local.get $y) (local.get $j))
        (i32.load
          (i32.add
            (local.get $src)
            (i32.mul
              (i32.add
                (i32.mul (local.get $w) (local.get $j))
                (local.get $i))
              (i32.const 4)))))
  ;; ...
  
Calculate the pixel color... Get memory offset (w * j + i) * 4 Add $src Load i32 color at this address

  ;; Sprite Data  16x16x4 = 1024 bytes
  (data (i32.const 0x100)
    "\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00"
    "\00\00\00\00\00\00\00\00\df\71\26\ff\df\71\26\ff"
    "\df\71\26\ff\df\71\26\ff\00\00\00\00\00\00\00\00"
    "\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00"
    "\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00"
    "\df\71\26\ff\df\71\26\ff\fb\f2\36\ff\fb\f2\36\ff"
    "\fb\f2\36\ff\fb\f2\36\ff\df\71\26\ff\df\71\26\ff"
    "\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00"
    "\00\00\00\00\00\00\00\00\00\00\00\00\df\71\26\ff"
    "\fb\f2\36\ff\fb\f2\36\ff\fb\f2\36\ff\fb\f2\36\ff"
    "\fb\f2\36\ff\fb\f2\36\ff\fb\f2\36\ff\fb\f2\36\ff"
    ...
  )
  
Store 1024 bytes of sprite data at 0x100
Step 8-9: Masking and Clipping Pixels

  (func $put-pixel (param $x i32) (param $y i32)
                   (param $color i32)
    ;; return if the x/y coordinate is out of bounds
    (br_if 0
      (i32.or
        (i32.ge_u (local.get $x) (i32.const 150))
        (i32.ge_u (local.get $y) (i32.const 150))))
    ...
  )
  
br_if 0 at toplevel is like if (...) return; Check if $x >= 150 || $y >= 150

  (func $draw-sprite ...

        ;; pixel = mem[src + (w * j + i) * 4]
        (local.set $pixel (i32.load ...)))

        ;; if (pixel != 0)
        (if (local.get $pixel)
          (then
            (call $put-pixel ...))

  )
  
Load the pixel, but only draw it if it is non-zero.

⏩ Step 10: Draw Sprites With Palette

What's different?

  • Each sprite pixel is now 1 byte
  • The size of one sprite goes from 1024 bytes to 256
  • A 16-color palette is included for 64 bytes
Step 11: Scaling Sprites

  (func $draw-sprite (param $x i32) (param $y i32)
                     (param $src i32)
                     (param $sw i32) (param $sh i32)
                     (param $dw i32) (param $dh i32)
    (local $dx f32)
    (local $dy f32)

    ;; dx = sw / dw
    (local.set $dx
      (f32.div (f32.convert_i32_s (local.get $sw))
               (f32.convert_i32_s (local.get $dw))))
    ;; dy = sh / dh
    (local.set $dy
      (f32.div (f32.convert_i32_s (local.get $sh))
               (f32.convert_i32_s (local.get $dh))))
  
Rename $w to $sw and $h to $sh for the source width and height Add $dw and $dh for destination width and height Define two f32 floating-point locals Calculate scale factor, converting from i32 to f32

      ;; pixel = mem[src + (sw * j * dy + i * dx)]
      (local.set $pixel
        (i32.load
          (i32.add
            (local.get $src)
            (i32.add
              (i32.mul
                (local.get $sw)
                (i32.trunc_f32_s
                  (f32.mul (f32.convert_i32_s (local.get $j))
                           (local.get $dy))))
              (i32.trunc_f32_s
                (f32.mul (f32.convert_i32_s (local.get $i))
                         (local.get $dx)))))))
  
Use scale factors when reading source pixel
How to represent the grid?

Using 1 byte per cell

  • We could have 64-element array of bytes
  • Each byte is a value between 1 and 8, one for each emoji

Using 1 bit per cell instead

  • For each emoji, we have an i64 value representing the grid
  • Each bit is 1 if there is an emoji in that cell
Using 1 bit per cell makes some algorithms smaller (we'll see this later).
Step 12-13: Draw the Grid
  • Initialize $grid with the 64-bit grid
  • If $grid == 0, then we're done
  • Find the index of the lowest bit set in $grid
  • Draw an emoji at the index
  • Clear the lowest bit in $grid, goto step 2

  (func $draw-grid (param $grid i64) (param $gfx-src i32)
    (local $i i32)
    (loop $loop
      ;; Exit the function if $grid is zero
      (br_if 1 (i64.eqz (local.get $grid)))
      ;; Get the index of the lowest set bit
      (local.set $i (i32.wrap_i64 (i64.ctz (local.get $grid))))
      ;; Draw the cell at that index
      (call $draw-cell ...)
      ;; Clear the lowest set bit: bits &= bits - 1
      (local.set $grid
        (i64.and (local.get $grid)
                 (i64.sub (local.get $grid) (i64.const 1))))
      (br $loop)))
  
Define $draw-grid, with $grid and $gfx-src br_if 1 here exits the function, past the loop. i64.eqz checks if a value is equal to zero i64.ctz is count trailing zeroes, which gets the index of the lowest set bit
i32.wrap_i64 converts an i64 to i32
Clear the lowest set bit using the trick b & (b-1)
Step 15-18: Animation

Linear interpolation


  (func $ilerp (param $a i32) (param $b i32) (param $t f32)
               (result i32)
    ;; return a + (b - a) * t
    (i32.add
      (local.get $a)
      (i32.trunc_f32_s
        (f32.mul
          (f32.convert_i32_s
            (i32.sub (local.get $b) (local.get $a)))
          (local.get $t))))
  )
  
Define $ilerp function that interpolates between $a and $b by the factor $t Return a value of type i32 A function returns its last expression

Ease out cubic


  (func $ease-out-cubic (param $t f32) (result f32)
    ;; return t * (3 + t * (t - 3))
    (f32.mul
      (local.get $t)
      (f32.add
        (f32.const 3)
        (f32.mul
          (local.get $t)
          (f32.sub (local.get $t) (f32.const 3)))))
  )
  ...
  (call $ilerp (i32.const 10) (i32.const 30)
               (call $ease-out-cubic (f32.const 0.5)))
  
Define the $ease-out-cubic function Example: interpolate between 10 and 30, using $ease-out-cubic

  ;; struct Cell { s8 x, y, w, h; };
  ;; [0x3200..0x3300)  current offset  Cell[64]
  ;; [0x3300..0x3400)  start offset    Cell[64]
  ;; [0x3400..0x3500)  end offset      Cell[64]
  ;; [0x3500..0x3600)  time [0..1)     f32[64]
  ...
  ;; t = t[i]
  (local.set $t (f32.load offset=0x3500 (local.get $t-addr)))
  ;; current[i] = ilerp(start[i], end[i], easeOutCubic(t))
  (i32.store8 offset=0x3200
    (local.get $i-addr)
    (call $ilerp
      (i32.load8_s offset=0x3300 (local.get $i-addr))
      (i32.load8_s offset=0x3400 (local.get $i-addr))
      (call $ease-out-cubic (local.get $t))))
  
Data layout so each cell has an x, y, width, height offset Load the $t value for each cell from memory Load the starting x/y/w/h value
(depending on $i-addr)
Load the ending x/y/w/h value Interpolate from start to end, using $t Store the interpolated value as the "current" value

⏩ Step 19: Dragging an Emoji

⏩ Step 20: Clamping to 4 Adjacent Cells

⏩ Step 21: Swap Animation

⏩ Step 24: Swap Cells After Drag

Step 25: Checking For Matches
Remember when I said that using 1 bit per cell makes some algorithms smaller?
  1. Initialize the result to i64 0
  2. Create a bit pattern to match against
  3. Go to the next pattern if this pattern is not valid
  4. Use i64.and to check if all 3 match
  5. If the pattern matches, use i64.or to add it to the result
  6. Shift the pattern left 1, and go to step 3

    (if (i32.and
          (i32.wrap_i64
            (i64.and (local.get $valid) (i64.const 1)))
          (i64.eq
            (i64.and (local.get $grid) (local.get $pattern))
            (local.get $pattern)))
      (then
        (local.set $result
          (i64.or (local.get $result) (local.get $pattern)))))

    (local.set $pattern
      (i64.shl (local.get $pattern) (i64.const 1)))

    (local.set $valid
      (i64.shr_u (local.get $valid) (i64.const 1)))
  
If $valid & 1 is non-zero... And $grid & $pattern == $pattern... Then $pattern matches, so add it to $result Shift the $pattern to the left by 1 Shift $valid to the right by 1

  (i32  ;; ........    ........                 pattern
        ;; ........    x.......
        ;; ........    x.......
        ;; xxx.....    x.......
         0x00000007  0x00010101)
  (i64  ;;    xxxxxx..            ........      valid mask
        ;;    xxxxxx..            ........
        ;;    xxxxxx..            xxxxxxxx
        ;;    xxxxxx..            xxxxxxxx
        ;;    xxxxxx..            xxxxxxxx
        ;;    xxxxxx..            xxxxxxxx
        ;;    xxxxxx..            xxxxxxxx
        ;;    xxxxxx..            xxxxxxxx
        0x3f3f3f3f3f3f3f3f  0x0000ffffffffffff)
  

⏩ Step 26: Swap Back If No Match

Step 27: Move Emojis Down After Swap
  1. Start from the bottom and work up
  2. For each empty cell, find the first non-empty above it, if any
  3. If found, then swap the two cells

    ;; Get the index of the lowest set bit
    (local.set $i (i64.ctz (local.get $empty)))
    ;; Find the next cell above that is not empty:
    ;; invert the empty pattern and mask it with a column,
    ;; shifted by i.
    (local.set $above-bits
      (i64.and
        (i64.xor (local.get $empty) (i64.const -1))
        (i64.shl (i64.const 0x0101010101010101) (local.get $i))))
    ;; Now find the lowest set bit
    (local.set $above-idx (i64.ctz (local.get $above-bits)))
    ;; If there is a cell above this one...
    (if (i64.ne (local.get $above-bits) (i64.const 0))
      (then
        ;; Move the cell above down...
  
Get the index of the lowest set bit in $empty Keep only the bits that are above this cell Wasm doesn't (currently) have i64.not so you can use i64.xor instead Shift the vertical column mask, 0x0101010101010101, left by $i Then find the lowest bit set in $above-bits If there is a non-empty cell above this one, then swap the cells at $i and $above-idx

⏩ Step 28: Randomize Board at Start

⏩ Step 29: Check Matches After Dropping

⏩ Step 30: Animate Match Removal

Step 31: Checking For Game Over
Remember when I said that using 1 bit per cell makes some algorithms smaller? πŸ˜€

  (i32  ;; ..x.....    .x......    x.......    xx......  
        ;; xx......    x.x.....    .xx.....    ..x.....  
         0x00000403  0x00000205  0x00000106  0x00000304  
        ;; x.x.....    .xx.....    ........    ........ 
        ;; .x......    x.......    xx.x....    x.xx.... 
         0x00000502  0x00000601  0x0000000b  0x0000000d )

  (i64 0x003f3f3f3f3f3f3f   ;; ........      xxxxx...
       0x003f3f3f3f3f3f3f   ;; xxxxxx..      xxxxx...
       0x003f3f3f3f3f3f3f   ;; xxxxxx..      xxxxx...
       0x003f3f3f3f3f3f3f   ;; xxxxxx..      xxxxx...
       0x003f3f3f3f3f3f3f   ;; xxxxxx..      xxxxx...
       0x003f3f3f3f3f3f3f   ;; xxxxxx..      xxxxx...
       0x1f1f1f1f1f1f1f1f   ;; xxxxxx..      xxxxx...
       0x1f1f1f1f1f1f1f1f ) ;; xxxxxx.. *6   xxxxx... *2
  
horizontal patterns horizontal valid masks

Done!

right?

What's left?

  • Implementing score
  • Title screen / Game Over screen?
  • Combo multiplier?
  • Optimizing size (currently at 4817 bytes!)

Resources

These Slides
github.com/binji/wasmsummit2021-talk
Raw Wasm
github.com/binji/raw-wasm
Wabt (WebAssembly Binary Toolkit)
github.com/webassembly/wabt
Twitter
@binjimint
Thanks!