Serialization

Noun serialization refers to a uniquely-defined technique for converting a noun into a single atom. The basic noun serialization process in Urbit, called “jamming”, includes supporting internal references so that there is a minor compression effect included.

Serialization Arms

The main tools from /sys/hoon for noun serialization are:

++cue, unpack a jammed noun
++jam, pack a jammed noun
++mat, length-encode a noun
++rub, length-decode a noun

++jam and ++cue are critically important for noun communication operations, including the %lick vane, the %khan vane, and noun channels in %eyre.

`++cue`

It is more straightforward to see how to decode a noun than to encode it, so let's start there.

++  cue                                                 ::  unpack
  ~/  %cue
  |=  a=@
  ^-  *
  =+  b=0
  =+  m=`(map @ *)`~
  =<  q
  |-  ^-  [p=@ q=* r=(map @ *)]
  ?:  =(0 (cut 0 [b 1] a))
    =+  c=(rub +(b) a)
    [+(p.c) q.c (~(put by m) b q.c)]
  =+  c=(add 2 b)
  ?:  =(0 (cut 0 [+(b) 1] a))
    =+  u=$(b c)
    =+  v=$(b (add p.u c), m r.u)
    =+  w=[q.u q.v]
    [(add 2 (add p.u p.v)) w (~(put by r.v) b w)]
  =+  d=(rub c a)
  [(add 2 p.d) (need (~(get by m) q.d)) m]

The above gate accepts an atom b, which is a “blob” or as-yet-undefined value. Pin a cursor to run from low to high (LSB) at 0. The empty map m will map from cursor to noun. (All cursor-to-noun mappings will be saved here.)

The trap produces a 3-tuple where p is the following cursor position; q is the noun; and r is the cache. The product itself will be q.

A data cursor is pinned at c, while the third bit is pinned at a. If the first bit at a is 0b0, then the noun is a direct atom. Pin the expansion of the second bit at a as c; p is now the cursor after c, q is the atom from e, and r is an updated cache. (See ++rub discussion below for the atom expansion details.)

Otherwise, we have a cell, so we have to expand the second bit at a. If it is 0b0, then the cell needs to be decoded by decoding the noun at . If it is 0b1, then we have a saved reference and retrieve it from the cache (last branch).

If the second bit at a is 0b1, then the noun is a saved reference. In that case, expand at c (the head), as $(b c) and pin the cursor after as v. w is the cell of q.u and q.v. p is the lengths of u and v plus 2. q is w, with r is the cache with w inserted.

`++rub`

++  rub                                                 ::  length-decode
  ~/  %rub
  |=  [a=@ b=@]
  ^-  [p=@ q=@]
  =+  ^=  c
      =+  [c=0 m=(met 0 b)]
      |-  ?<  (gth c m)
      ?.  =(0 (cut 0 [(add a c) 1] b))
        c
      $(c +(c))
  ?:  =(0 c)
    [1 0]
  =+  d=(add a +(c))
  =+  e=(add (bex (dec c)) (cut 0 [d (dec c)] b))
  [(add (add c c) e) (cut 0 [(add d (dec c)) e] b)]

++rub extracts a self-measuring atom from an atomic blob, which accepts a cell of bit position cursor a and atomic blob b. ++rub produces a number of bits to advance the cursor p and the encoded atom q.

c is a unary sequance of 0b1 bits followed by a 0b0 bit. If c is 0b0, then the cursor advancement p is 0b1 and the encoded atom q is 0b0. Otherwise, c is the number of bits needed to express the number of bits in q.

The cursor a is advanced to include c and the terminator bit.

We pin e, the number of bits in q. This is encoded as a c-1-length sequence of bits following a, which is added to $2^{c-1}$. p (the number of bits consumed) is c+c+e. The packaged atom q is the e-length bitfield at a+c+c.

`++jam`

The basic idea of ++jam is to produce a serial noun (in order of head/tail). This requires a recursive examination of the encoded noun:

One bit marks cell or atom.
Next entry marks bit length of value for atom, 0 if cell, or 1 if reference.
Then the actual value (itself a cell or atom).

Some readers may prefer to examine the Python noun.py implementation↗ to supplement the Hoon definition; see particularly mat().

Examples:

> `@ub`(jam ~)
0b10
::  start at LSB, so `0` for atom, `1` for length, `0` for value (head-trimmed
::  zero, really `0b010`)

> `@ub`(jam 1)
0b1100
::  start at LSB, so `0` for atom, `01` for length, `1` for value

> `@ub`(jam [0 0])
0b10.1001
::  start at LSB, so `01` for cell, then `0` for head atom, length `1`,
::  value `0`, repeat

> `@ub`(jam [0 1])
0b1100.1001
::  start at LSB, so `01` for cell, then `0` for head atom, length `1`,
::  value `0`, repeat with value `1`

> `@ub`(jam [1 0])
0b1011.0001

> `@ub`(jam 0b111)
0b1111.1000

> `@ub`(jam [0 1 2])
0b1.0010.0011.0001.1001

To cue a jamfile:

> =jammed-file .^(@ %cx %/jammed/jam)
> (cue jammed-file)
...

This cues as a noun, so it should be ;; or clammed to a particular mold.

`eval` and newt encoding

Newt encoding is a variation of this: it is a jammed noun with a short header. Newt encoding is used by urbit eval.

The format for a newt-encoded noun is:

V.BBBB.JJJJ.JJJJ...

V version
B jam size in bytes (little endian)
J jammed noun (little endian)

eval supports several options for processing Hoon nouns as input to or output from conn.c:

-j, --jam: output result as a jammed noun.
-c, --cue: read input as a jammed noun.
-n, --newt: write output / read input as a newt-encoded jammed noun, when paired with -j or -c respectively.
-k: treat the input as the jammed noun input of a %fyrd request to conn.c; if the result is a goof, pretty-print it to stderr instead of returning it.

In the Vere runtime, these are used by vere/newt.c; see particularly:

u3_newt_send() transmits a jammed noun (using u3s_jam_xeno(), for instance) to a task buffer for libuv. (Recall that libuv is the main event loop driver for the king process.)
u3_newt_read() pulls out the jammed noun from the buffer.

Serialization is also supported by serial.c functions, prefixed u3s.

Khan and Lick both use newt.c.

Serialization Arms

++cue

++rub

++jam

eval and newt encoding

`++cue`

`++rub`

`++jam`

`eval` and newt encoding