Serialization
Noun serialization refers to a uniquely-defined technique for converting a noun into a single atom. The basic noun serialization process in Urbit, called “jamming”, includes supporting internal references so that there is a minor compression effect included.
Serialization Arms
The main tools from /sys/hoon
for noun serialization are:
+cue
, unpack a jammed noun+jam
, pack a jammed noun+mat
, length-encode a noun+rub
, length-decode a noun
+jam
and +cue
are critically important for noun communication operations, including the %lick
vane, the %khan
vane, and noun channels in %eyre
.
+cue
+cue
It is more straightforward to see how to decode a noun than to encode it, so let's start there.
++ cue :: unpack
~/ %cue
|= a=@
^- *
=+ b=0
=+ m=`(map @ *)`~
=< q
|- ^- [p=@ q=* r=(map @ *)]
?: =(0 (cut 0 [b 1] a))
=+ c=(rub +(b) a)
[+(p.c) q.c (~(put by m) b q.c)]
=+ c=(add 2 b)
?: =(0 (cut 0 [+(b) 1] a))
=+ u=$(b c)
=+ v=$(b (add p.u c), m r.u)
=+ w=[q.u q.v]
[(add 2 (add p.u p.v)) w (~(put by r.v) b w)]
=+ d=(rub c a)
[(add 2 p.d) (need (~(get by m) q.d)) m]
The above gate accepts an atom .b
, which is a “blob” or as-yet-undefined value. Pin a cursor to run from low to high (LSB) at 0. The empty map .m
will map from cursor to noun. (All cursor-to-noun mappings will be saved here.)
The trap produces a 3-tuple where .p
is the following cursor position; .q
is the noun; and .r
is the cache. The product itself will be .q
.
A data cursor is pinned at .c
, while the third bit is pinned at .a
. If the first bit at .a
is 0b0
, then the noun is a direct atom. Pin the expansion of the second bit at .a
as .c
; .p
is now the cursor after .c
, .q
is the atom from .e
, and .r
is an updated cache. (See +rub
discussion below for the atom expansion details.)
Otherwise, we have a cell, so we have to expand the second bit at .a
. If it is 0b0
, then the cell needs to be decoded by decoding the noun at . If it is 0b1
, then we have a saved reference and retrieve it from the cache (last branch).
If the second bit at .a
is 0b1
, then the noun is a saved reference. In that case, expand at .c
(the head), as $(b c)
and pin the cursor after as .v
. +w
is the cell of q.u
and q.v
. .p
is the lengths of .u
and .v
plus 2. .q
is +w
, with .r
is the cache with +w
inserted.
+rub
+rub
++ rub :: length-decode
~/ %rub
|= [a=@ b=@]
^- [p=@ q=@]
=+ ^= c
=+ [c=0 m=(met 0 b)]
|- ?< (gth c m)
?. =(0 (cut 0 [(add a c) 1] b))
c
$(c +(c))
?: =(0 c)
[1 0]
=+ d=(add a +(c))
=+ e=(add (bex (dec c)) (cut 0 [d (dec c)] b))
[(add (add c c) e) (cut 0 [(add d (dec c)) e] b)]
+rub
extracts a self-measuring atom from an atomic blob, which accepts a cell of bit position cursor .a
and atomic blob .b
. +rub
produces a number of bits to advance the cursor .p
and the encoded atom .q
.
.c
is a unary sequance of 0b1
bits followed by a 0b0
bit. If .c
is 0b0
, then the cursor advancement .p
is 0b1
and the encoded atom .q
is 0b0
. Otherwise, .c
is the number of bits needed to express the number of bits in .q
.
The cursor .a
is advanced to include .c
and the terminator bit.
We pin .e
, the number of bits in .q
. This is encoded as a c-1
-length sequence of bits following .a
, which is added to $2^{c-1}$. .p
(the number of bits consumed) is c+c+e
. The packaged atom .q
is the .e
-length bitfield at a+c+c
.
+jam
+jam
The basic idea of +jam
is to produce a serial noun (in order of head/tail). This requires a recursive examination of the encoded noun:
One bit marks cell or atom.
Next entry marks bit length of value for atom,
0
if cell, or1
if reference.Then the actual value (itself a cell or atom).
Some readers may prefer to examine the Python noun.py
implementation to supplement the Hoon definition; see particularly mat()
.
Examples:
> `@ub`(jam ~)
0b10
:: start at LSB, so `0` for atom, `1` for length, `0` for value (head-trimmed
:: zero, really `0b010`)
> `@ub`(jam 1)
0b1100
:: start at LSB, so `0` for atom, `01` for length, `1` for value
> `@ub`(jam [0 0])
0b10.1001
:: start at LSB, so `01` for cell, then `0` for head atom, length `1`,
:: value `0`, repeat
> `@ub`(jam [0 1])
0b1100.1001
:: start at LSB, so `01` for cell, then `0` for head atom, length `1`,
:: value `0`, repeat with value `1`
> `@ub`(jam [1 0])
0b1011.0001
> `@ub`(jam 0b111)
0b1111.1000
> `@ub`(jam [0 1 2])
0b1.0010.0011.0001.1001
To cue a jamfile:
> =jammed-file .^(@ %cx %/jammed/jam)
> (cue jammed-file)
...
This cues as a noun, so it should be ;;
or clammed to a particular mold.
+eval
and newt encoding
+eval
and newt encodingNewt encoding is a variation of this: it is a jammed noun with a short header. Newt encoding is used by urbit eval
.
The format for a newt-encoded noun is:
V.BBBB.JJJJ.JJJJ...
V
versionB
jam size in bytes (little endian)J
jammed noun (little endian)
+eval
supports several options for processing Hoon nouns as input to or output from conn.c
:
-j
,--jam
: output result as a jammed noun.-c
,--cue
: read input as a jammed noun.-n
,--newt
: write output / read input as a newt-encoded jammed noun, when paired with-j
or-c
respectively.-k
: treat the input as the jammed noun input of a%fyrd
request toconn.c
; if the result is a$goof
, pretty-print it tostderr
instead of returning it.
In the Vere runtime, these are used by vere/newt.c
; see particularly:
u3_newt_send()
transmits a jammed noun (usingu3s_jam_xeno()
, for instance) to a task buffer forlibuv
. (Recall thatlibuv
is the main event loop driver for the king process.)u3_newt_read()
pulls out the jammed noun from the buffer.
Serialization is also supported by serial.c
functions, prefixed u3s
.
Khan and Lick both use newt.c
.
Last updated