6. Vere I: u3 and the Serf

We need some C background to fluently read the runtime source. If you are not familiar with C, here are some good resources:

Runtime Structure and Responsibilities

If the vision of Urbit is to implement [2 [0 3] 0 2] as a frozen lifecycle function, then it needs some scaffolding on any real system. Real computers have memories, chipset architectures, operating system conventions, and other affordances (and limitations). Conventionally, an operating system takes explicit care of such niceties, which is one reason why calling Urbit an “OS” has been controversial. The runtime interpreter is designed to take Nock as a specification and produce a practical computer of it.

Today, there are two primary Nock executable runtimes: Vere and Sword (née Ares). (Jaque, the JVM experiment, and King Haskell have fallen by the wayside.)

  • Vere is written in C and is the standard runtime for Arvo.
  • Sword is written in Rust and aims to solve some deep theoretical challenges to producing efficient Nock on a contemporary chipset. Sword is under development by Zorp and formerly with contributions from Tlon Corporation, and the Urbit Foundation.

We will take Vere as the normative runtime for Core Academy.

As we mentioned last time in the boot sequence lesson, the runtime spawns the king (king.c) and indirectly the serf (serf.c) processes. These both run for the lifetime of the Urbit process.

There are two competing frames for how to structure the Urbit process: king/serf and urth/mars.

King v. serf separates the Nock and Arvo material from the I/O and event log material. It has the advantage that (per the whitepaper), “The serf only ever talks to the king, while the king talks with both the serf and Unix.”

The king process is in charge of:

  • IPC
  • Event log
  • Unix effects including I/O
  • Stateless Nock interpreter

The serf process is the Nock runtime and bears responsible for:

  • Nock virtual machine (tracking current state of Arvo as a noun and ++pokeing it with nouns)
  • Bytecode interpretation
  • Jet dashboard
  • Snapshotting
  • Noun allocation for Arvo

The Mars/Urth split reframes the worker process so that it includes the event log with the current serf responsibility (“Mars”), thus enabling online event log management and truncation.

The Structure of Vere’s Source

Vere is provided in the urbit/vere repo. It is built from the pkg/ directory and contains the following top-level folders:

.
├── c3
├── ent
├── noun
├── ur
├── urcrypt
└── vere
  • /c3 contains the types and definitions to enable the c3 logical system.

    c3 is the set of C conventions which Vere enforces. These include well-specified integer types, tooling for loobeans (instead of booleans), and motes (#defines for short Urbit words). “The C3 style uses Hoon style TLV variable names, with a quasi Hungarian syntax.” There are no Urbit-specific requirements for C3, which could otherwise just be a general-purpose C discipline.

    Like aura bitwidth markers, C documents programmer intent but does not generally enforce it. Most of the parts of c3 are simply lapidary terms for C99 types.

    • Scan the files in /c3.
  • /ent provides entropy for the runtime. Entropy is derived from /dev/urandom, which is a special file that provides pseudorandom numbers derived from system noise. /dev/urandom produces machine randomness as close to true randomness as possible, including seeds like network latency and keystroke latency to seed the cryptographically secure pseudo-random number generator (CSPRNG).

  • /noun is the gorilla, containing u3 (the noun library) and the jets. We'll go into it in detail with the system architecture in a moment in Section u3.

  • /ur, is like /ent a single-purpose library, in this case for bitstreams and serialization.

  • /urcrypt is a C library to standardize cryptographic calls across a number of libraries.

    This library is a dependency for both Vere and Ares, and is in the process of being moved into a standalone repo.

  • /vere contains the runtime architecture itself, the king and the serf and related tooling, as independent from u3.

filepurpose
auto.cI/O drivers
benchmarks.cperformance tests
dawn.ckey validation for bootstrapping
disk.cdatabase reads and writes for event log
foil.cfile synching
king.cmain runtime loop
lord.cmanage IPC between king and serf
main.csetup and entrypoint for runtime execution
mars.cMars event log replay (see Mars/Urth split above)
newt.cnoun blob messages
pier.cmanage pier (files on host OS disk)
save.csave events to pier
serf.cthe serf itself
time.cUnix/Urbit time operations
vere.hshared Vere-specific structs
ward.clifecycle management for structures

u3

Nouns

A noun is either an atom or a cell. However, we have to decide what this implementation looks like in a language like C, that prefers arrays and pointers. u3 is the noun library, which features Urbit-specific memory operations, tracing and profiling tools, and so forth.

A u3_noun is a 32-bit c3_w = uint32_t. The first bits indicate what kind of value the noun is and thus how to approach it:

Bit 31Bit 30Meaning
11Indirect cell (pom)
10Indirect atom (pug)
0·—Direct atom (cat)

An indirect noun is a dog. For indirect nouns, bits 29–0 are a word pointer into the loom. In addition, 0xffff.ffff is u3_none, which is “not a noun”.

A common pattern is to extract values from a noun into C-typed values, carry out the manipulation, and then wrap them back into the noun. Furthermore, the value from an arbitrary atom may in fact be a bignum, and so GMP is used to manage these values.

  • Examine /noun/jets/a/add.c, in particular u3qa_add.

One of the painful parts of working with u3 is the reference counting system. Reference counting is an expedient to handle tracking the number of pointers to an object in memory so that the memory can be freed at the appropriate time. Since C doesn't provide reference counting support in the language, we must manually track these and free the value only when the refcount goes to zero. The relevant functions are u3k to gain a refcount and u3z to lose one.

There are also two different protocols for reference counting, used by different parts of the system:

  • transfer semantics relinquishes a refcount of any sent values. Most functions behave this way, which means that you don't have to think about de-allocating values if they've been sent elsewhere.
  • retain semantics hold onto the refcount even if the value is sent elsewhere. The functions which use retain semantics tend to inspect or query nouns rather than make or modify nouns.

The u3 convention is that, unless otherwise specified, all functions have transfer semantics - with the exception of the prefixes: u3ru3xu3zu3q and u3w. Also, within jet directories a through f (but not g), internal functions retain (for historical reasons).

  • Compare u3ka_add and u3qa_add.

u3 is designed to make some guarantees for the programmer. It's not Urbit itself, but it's designed to be an implementation platform for Urbit. Thus:

  • Every event is logged internally before it enters u3.
  • A permanent state noun maintains a single reference.
  • Any event can be aborted without damaging the permanent state (“solid state”).
  • We snapshot the permanent state and can prune logs.

We will discuss the specifics of the memory model next week in ca06 when we discuss the loom and the road model.

  • “Land of Nouns”; note particularly the section u3: reference protocols, labeled THIS IS THE MOST CRITICAL SECTION IN THE `u3` DOCUMENTATION. Read that if nothing else.
Library

The contents of /noun constitute the u3 noun library. Functions are organized by file and prefix into certain namespaces by operation. Because u3 is a library, we can't cleanly separate it into serf/king components, although certain modules do have close identification with one or the other.

prefixpurpose.h.c
u3a_allocationallocate.hallocate.c
u3e_persistenceevents.hevents.c
u3h_hashtableshashtable.hhashtable.c
u3i_noun constructionimprison.himprison.c
u3j_jet controljets.hjets.c
u3l_logginglog.hlog.c
u3m_system managementmanage.hmanage.c
u3n_nock computationnock.hnock.c
u3o_command-line optionsoptions.hoptions.c
u3r_noun access (error returns)retrieve.hretrieve.c
u3s_noun serializationserial.hserial.c
u3t_profilingtrace.htrace.c
u3u_urth (memory management)urth.hurth.c
u3v_arvovortex.hvortex.c
u3x_noun access (error crashes)xtract.hxtract.c
u3z_memoizationzave.hzave.c
u3k[a-g]jets (transfer, C args)jets/k.hjets/[a-g]/*.c
u3q[a-g]jets (retain, C args)jets/q.hjets/[a-g]/*.c
u3w[a-g]jets (retain, nock core)jets/w.hjets/[a-g]/*.c
  • u3a defines memory allocation functions. These are used throughout, but we'll discuss it a bit more when we talk about the king. You will quickly run into reference counting features, like u3k (u3a_gain()) to gain a refcount and u3z (u3a_lose()) to lose one.
  • u3e manages the loom.
  • u3h provides fast custom hashing for the runtime.
  • u3i puts a value (expected to be a c3 type) into a noun. (Look at this one now.)
  • u3l supports logging.
  • u3m manages the system: boots u3, makes a pier, handles crashes, etc.
  • u3n implements the Nock bytecode interpreter.
  • u3o parses the manifold command-line options of Urbit and writes them into globals.
  • u3r extracts a value from a noun, with a u3_weak on failure. (Look at this one now.)
  • u3s implements noun serialization (++jam and ++cue).
  • u3t provides tracing for crashes.
  • u3u offers memory management tooling (deduplication and memory mapping).
  • u3v supports Arvo interaction.
  • u3x extracts a value from a noun., with a crash on failure
  • u3z supports ~+ siglus rune memoization.

If you work much in Vere, you will get used to seeing these. There are basically two broad categories of functions: single-use functions (like starting a pier, u3m_pier) and utility functions (like writing a value to a noun, u3i_word).

The Serf

The serf process is the Nock runtime and bears responsible for:

  • Nock virtual machine (tracking current state of Arvo as a noun and ++pokeing it with nouns)
  • Bytecode interpretation
  • Jet dashboard

If you examine /vere/serf.c, you can get a feel for how it is organized. See e.g. u3_serf_work and callees.

Arvo Noun Management

  • /vere/vortex.c, e.g. u3v_peek, u3v_wish, and u3v_poke_sure.

Nock Bytecode Interpreter (u3n)

  • /noun/nock.c, e.g. u3n_nock_on, u3n_slam_on (calling convention for gates).

The end result of the Hoon compilation process is Nock code as a noun. This noun is evaluated by the runtime, but it is not actually directly run as such. Instead, the runtime builds an efficient bytecode stream and executes that instead to complete the calculation.

The Nock bytecode for any expression can be obtained using the %xray raw hint.

> ~>  %xray  =+(2 [- +(-)])
{[litb 2] snol head swap head bump ault halt}
[2 3]
> ~> %xray =+(2 [(add - -) +(-)])
{[litb 2] snol [fask 4095] [kicb 1] snoc head swap [fabk 6] swap [fabk 6] auto musm [kicb 0] swap head bump ault halt}
[4 3]

The Nock bytecode is defined in the OPCODES macro in /noun/nock.c and evaluated by _n_burn in that same folder. The OPCODES #define uses the X macro, which is a bit of C deep lore.

As a consequence of the architecture of Vere today, we see a lot of expensive call overhead. For instance, when you wrap an %xray hint around a core, you don't get the core itself—instead you get the formula that invokes the code.

> ~> %xray (met 3 (jam .))
{[fask 1023] [kicb 3] snol head swap tail [lilb 3] swap tail [fask 1023] [kicb 2] snol head swap tail musm [kicb 1] auto musm [ticb 0] halt}
984.339

Since many things are computed in virtual Nock, ++mock, we have bail/trace/bounded computation at the price of slow virtualization.

One objective of Sword (née Ares), subject knowledge analysis, is to improve on Nock bytecode generation. This is being implemented into Vere as well.

Jet Dashboard (u3j)

As we summarized when first introducing jets in ca00, the runtime manages jets, including re-running them when playing back the event log history.

The jet dashboard is the system in the runtime that registers, validates, and runs jets: specific pieces of Nock code reimplemented in C for performance.

The jet dashboard maintains three jet state systems:

  1. cold state results from the logical execution history of the pier and consists of nouns. cold jet state registers jets as they are found. cold state ignore restarts.
  2. hot state is the global jet dashboard and describes the actual set of jets loaded into the pier for the current running process. Calls to hot state result from Nock Nine invocations of a core and an axis. hot state is thus tied to process restart.
  3. warm lists dependencies between cold and hot state. warm state can be cleared at any time and is cleared on restart.

The jet dashboard (u3j, /noun/jets.c) will not be explored in detail in Core Academy, but we do want to look at a couple of actual jets.

Jets

  • Examine /noun/jets/b/lent.c, /noun/jets/b/turn.c, /noun/jets/c/turn.c, /noun/jets/e/rs.c, /noun/jets/e/slaw.c.

Many Urbit contributors may find jet composition to be their first serious encounter with the runtime. On the bright side, jetting is a fairly constrained and well-understood space. However, it has a complex interface for unpacking calls and nouns, including reference counting requirements.

  • u3w functions are the main entry point (as identified in /noun/tree.c). These unpack and sanity-check the sample, then call either u3q or u3k variants of the jet. The unpacking axes are hard-coded in /noun/xtract.h.

  • By convention, u3q and u3w functions have transfer semantics.

  • u3k functions have retain semantics, so they are responsible to u3z free their values after the computation completes.

  • u3_none (0x7fff.ffff) is NOT the same as u3_nul. A jet that returns u3_none punts the value back to the Hoon/Nock version.

  • “Writing Jets”

  • ~timluc-miptev, “Jets in the Urbit Runtime”

Snapshotting

We'll cover snapshotting in the next lesson, ca06.