The Loom
Vere's main memory model is called the loom. (Presumably this is from the roads shuttling back and forth, and perhaps mixing a metaphor.) A contiguous block of memory, formerly 2GB but now specified by the runtime flag, is allocated for the loom. This is the noun arena, and to work on it we need to use special u3
-specific allocators (u3a
).
One standard contemporary memory model afforded by an operating system has a heap↗ for manual dynamic memory allocation (C malloc()
) and a stack↗ for local variable data (last-in-first-out, C alloca()
but also implicit). The heap grows from the bottom up and the stack from the top down. (The alternative model today is to use SunOS-style mmap()
to allow virtual memory paging anywhere in memory.)
0 brk ffff| heap | stack ||------------#################################+++++++++++++|| | |0 sp ffff
brk
is thebrk()
System V call, which marked the limit of the heap arena.sp
is the stack pointer.
u3
differs from this model in one particular: by permitting the heap and stack to point either way, we can efficiently nest pairs of stack and heap. We call such a pair (and their free memory arena) a road. The outermost road is the surface road, and inner roads are created in alternating directions when dependent calculations are embarked upon.
When a new inner road is created, it switches direction from the outer road. This puts its heap up against the outer road's stack, and its stack up against the outer road's heap. But when the road is terminated, its stack is freed while its heap becomes part of the outer road's stack.
A conventional heap-low-stack-high road is a north road:
0 rut hat ffff| | | ||~~~~~~~~~~~~-------##########################+++++++$~~~~~|| | | |0 cap mat ffff
while a reversed heap-high-stack-low road is a south road:
0 mat cap ffff| | | ||~~~~~~~~~~~~$++++++##########################--------~~~~~|| | | |0 hat rut ffff
cap
is the top of the stack.mat
is the bottom of the stack (ffff
in the example surface road).rut
is the bottom of the heap arena (not0
because of immutable storage).hat
is the top of the heap arena.~
is deep storage (immutable).-
is durable storage (heap).+
is temporary storage (stack).$
is the allocation frame.#
is free memory.
The motivation for the road model is that you need not update refcounts in senior memory. This diminishes the downsides of reference counting. (The upside of refcounting is deterministic finalization: automatic memory management comes from the properties of the computation itself not from external preemptive events. Refcounting gives eager determinism.)
When we need to process an event or perform any kind of complicated calculation, we process it using an inner road. Because the roads alternate direction, any data from an inner heap that needs to be preserved must be copied back out. The advantages, tho, are:
- The surface road is left read-only by
u3
and thus clean. Thus when snapshotting, clean pages are kept clean. - An inner calculation can be aborted without affecting the surface.
- The surface is not fragmented because the inner results are copied in when necessary.
Vere ends up in operation with nested roads during a computation:
0 ffff|------- ++++++++++++| surface road|~~~~~~~$+++++++++ ---------~~~~~~~~~~~~| inner road 1|~~~~~~~~~~~~~~~~~--------- +++++$~~~~~~~~~~~~~~~~~~~~~| inner road 2| ##&## | free memory
The Vere interpreter runs in a road, and you can check if you're on the surface road or in an inner road. (Most of the time you should just assume you're on an inner road.) This distinction matters, such as c3_assert()
, which produces an exception with stack trace on an inner road, but kills the process on an outer road.
The current road is u3R
, a global. Within Arvo, a new road is currently begun in the following cases:
- every event
- every read from the namespace (Arvo scry gate into vane doesn't, userspace
.^
dotket does) - every call to
++mink
(and++mock
, etc.)
Any work that will be done where you only want to keep one portion of it is a good candidate for a road.
The loom is organized into pages, each 16 KB in size. There is a guard page &
in the middle of free memory to make sure that the stack and heap do not overwrite each other. The guard page is adjusted in u3e_ward()
when necessary:
When a fault is detected in the guard page, the guard page is recentered in the free space of the current road. if the guard page cannot be recentered, then memory exhaustion has occurred.
There is a hope to add both a raw hint to suggest to the system to use a new road and a bump-allocation mode to permit turning off refcounting on inner roads when practical. (Cf. #6805↗.)
u3a
Allocator
To work with u3
memory, use the u3a
memory allocation functions:
u3a_malloc()
u3a_free()
u3a_realloc()
You should never call malloc()
in the loom (but can, of course, in the Vere layer above u3
).
Of course, we don't always know how large our atom will be. Therefore, the standard way of building large atoms is to allocate a block of raw space with
u3i_slab_init()
, then chop off the end withu3i_slab_malt()
(which does the measuring itself) oru3i_slab_mint()
in case you've measured it yourself.
Keep in mind that atoms do not retain leading zeros.
The reference counters introduced last time, u3a_gain()
=u3k()
and u3a_lose()
=u3z()
, are also part of u3a
. However, other than these you typically use u3a
indirectly through u3i
and u3r
/u3x
.
Some details of the allocator are in flux right now↗: “As an experiment, [~master-morzod has] rewritten the serf to a) stop allocating events, effects, and IPC messages on the home road, and b) keep the Arvo kernel on an inner road for as long as possible (i.e. until we need to save/pack/meld/&c.).”
- “Land of Nouns”, section “
u3
: the road model” - “API overview by prefix”, section “
u3a
: allocation functions”
The King (Urth)
The king process is in charge of:
- IPC
- Event log
- Unix effects including I/O
- Stateless Nock interpreter (the “ghost ship” ivory pill material from
ca05
)
(We discussed the serf/Mars in ca05
.)
Event Log & Snapshotting
The event log is the ordered list of all of the Arvo events (completed moves) resulting in the present state of Arvo. Events are handled at two levels: the event log, which is written consistently at each event, and the snapshot of the loom, which allows rapid recovery of the current state.
In practice, event logs become large and unwieldy over time. Periodically a snapshot of the permanent state is taken, so the entire event log needn't be replayed on reboot. You're still able to rebuild your state down to the last keystroke. This is due to the practice of persistence. Persistence, in the context of storing data in a computer system, means that data is stored in a non-volatile manner and that input must be recorded before the output result is performed. Thus, every event must be written to disk - or must be persisted - before the event effects actually take place.
The snapshot of the loom allows the last few events from the event log to be replayed to recover the present state.
In fact, with current usage patterns (~2023.10.12), there's a problem:
Every few minutes, the runtime applies a patch to its on-disk snapshot. This pauses the process, so the size is important. In particular, for many large ships (~nibset-napwyn, ~wicdev-wisryt, ~natnex-ronret), this is around 600MB. If the maximum acceptable pause is 1 second, this requires a disk which can handle 600MB/s of throughput (or maybe twice that, because it writes the patchfile, then applies it?), which is extremely high. (#6805↗)
noun/error.h
Before we look at the particulars of the event log and snapshotting code, let's take a brief detour to see how assertion errors are handled.
# define u3_assert(x) \do { \if (!(x)) { \fflush(stderr); \fprintf(stderr, "\rAssertion '%s' " \"failed in %s:%d\r\n", \#x, __FILE__, __LINE__); \u3m_bail(c3__oops); \abort(); \} \} while(0)
An error here then triggers into u3m_bail()
, the primary crash handler.
- Review
u3m_bail
innoun/manage.c
.
Event Log
What is an event on disk? Urbit maintains an LMDB↗ transactional database for key–value pairs, with a META
table for metadata and an EVENTS
table for event number–event data pairs, sequentially ordered.
An event is a u3_fact
, a struct
including the timestamp and event ovum
:
/* u3_fact: completed event*/typedef struct _u3_fact {c3_d eve_d; // event numberc3_l mug_l; // kernel mug afteru3_noun job; // (pair date ovum)struct _u3_fact* nex_u; // next in queue} u3_fact;
Event log replay thus refers to retrieving the sequence of events from the pier's database instance and playing back each sequential event.
- Examine
u3_mars_play()
for details of how playback works._mars_play_batch()
_mars_poke_play()
u3v_poke_raw()
and we're back to a conventional Arvo poke as we saw inca05
.
The introduction of epochs will enable finer-grained system recovery when necessary:
Historically, Vere has stored a single event log and snapshot. To facilitate replay across different binary versions more convenient and less error-prone, an improved design is the "epoch" system. In the epoch system, Vere breaks up the event log into "epoch"s, where an epoch represents a snapshot and some events after that snapshot. An epoch lives in its own folder, named after the first event in that epoch. In addition to storing a snapshot and a log of events, each epoch folder also stores a version file indicating which version of Vere originally ran these events -- this makes replay across different binary versions much easier, especially in the case of a jet mismatch in an old binary.
Snapshots
Replay is how Vere computes the state of a ship's Arvo instance from the event log after a ship reboots. In order to avoid replaying the entire event log, Replay takes a snapshot of the current state of the ship approximately once every ten minutes. Then when a ship reboots, Replay loads the most recent snapshot and replays events from the event log up to the most recent event.
u3e_save()
saves the loom (snapshots), often called via u3m_save()
.
Demand Paging
Demand paging refers to the ability to load only needed pages of memory into RAM, leaving other pages on disk, to reduce memory use.
Pages are marked as clean (PROT_READ
) or dirty (PROT_READ|PROT_WRITE
) or guard (PROT_NONE
). The access pattern assumes all pages are accessed from the outside inwards (another advantage of the loom model).
urbit/vere
#402 (merged)urbit/vere
#410 (merged)
Unix I/O
The king is responsible for the I/O operations of the communicating vanes: Ames, Behn, Clay, Dill, Eyre, Iris, Khan, Lick. (The other two vanes, Gall and Jael, are landlocked and only interact within Urbit.)
We will cover the I/O drivers in a later lesson ca11
after we have covered the major vanes which need to interface with the host OS.
IPC
The Urbit runtime has two categories of IPC:
- King/serf interprocess communication
- Vane-driven interprocess communication
%khan
/conn.c
-based sockets%lick
-based communications
In general, POSIX IPC↗ is “a mechanism that allows processes to communicate with each other and synchronize their actions.” This can be done by sharing memory directly between the processes or by passing messages. Vere does a little of both: the loom is the shared memory arena, and sometimes messages are used.
For instance, in Vere's pier management (mainly vere/pier.c
), the lord↗ coordinates the king and the serf through messages. Like Arvo, the king and the serf thus need to have the right API shape to connect to each other. The lord coordinates using writ
to pass a value from the king to the serf, and plea
to pass a value from the serf to the king. (These are rather like Arvo passes and gifts, but can be initiated from either side rather than just the pass/give pattern.) Whimsically, this is defined in Hoon inside a C comment in vere/lord.c
:
|%:: +writ: from king to serf::+$ writ$% $: %live$% [%cram eve=@][%exit cod=@][%save eve=@][%meld ~][%pack ~]== ==[%peek mil=@ sam=*][%play eve=@ lit=(list ?((pair @da ovum) *))][%work mil=@ job=(pair @da ovum)]==:: +plea: from serf to king::+$ plea$% [%live ~][%ripe [pro=%1 hon=@ nok=@] eve=@ mug=@][%slog pri=@ tank][%flog cord]$: %peek$% [%done dat=(unit (cask))][%bail dud=goof]== ==$: %play$% [%done mug=@][%bail eve=@ mug=@ dud=goof]== ==$: %work$% [%done eve=@ mug=@ fec=(list ovum)][%swap eve=@ mug=@ job=(pair @da ovum) fec=(list ovum)][%bail lud=(list goof)]== ====--
The procedure for IPC needs to establish communications, which is a follow-on from the king starting the serf. (Vere is single-threaded but runs two processes.)
+$writ
from king to serf:
%live
is a request to start up the serf.%peek
is a request for data from Arvo in the serf.%play
is a request to the serf to play an event (in an event playback).%work
is a request to the serf to carry out a computation in Arvo.
+$plea
from serf to king:
%live
tells if the serf is alive.%ripe
tracks the serf startup state.%slog
is an output request.%flog
is a debug output request.%peek
is a response to the king with a scry result.%play
is a response to an event playback.%work
is a response to an injectedovum
event.
Other parts of IPC depend on the atom-framing implementations in vere/newt.c
, another key part of king–serf IPC. newt.c
produces noun blobs that have a five-byte header and a variable-length payload. The header has a one-byte version tag, typically 0x0
, followed by a four-byte little-endian message byte count. The payload is the ++jam
med noun. This is used by urbit eval
for stdin
computed against the ivory pill ghost ship, for instance:
$ echo "(add 1 41)" | urbit evalloom: mapped 2048MBlite: arvo formula 2a2274c9lite: core 4bb376f0lite: final state 4bb376f0eval (run):42
eval
supports several options for processing Hoon nouns as input to or output from conn.c
:
-j
,--jam
: output result as a jammed noun-c
,--cue
: read input as a jammed noun-n
,--newt
: write output / read input as a newt-encoded jammed noun, when paired with-j
or-c
respectively-k
: treat the input as the jammed noun input of a%fyrd
request toconn.c
; if the result is agoof
, pretty-print it tostderr
instead of returning it
In vere/newt.c
, see particularly:
u3_newt_send()
transmits a jammed noun (usingu3s_jam_xeno()
, for instance) to a task buffer forlibuv
. (Recall thatlibuv
is the main event loop driver for the king process.)u3_newt_read()
pulls out the jammed noun from the buffer.
Khan and Lick both use newt.c
.
- Demonstrate invoking a statement at the CLI with the
urbit
executable.
Example: |meld
Trace
- Let's walk through the lifecycle of a command-line initiated
./zod/.run meld
.vere/main.c
noun/urth.c
- Compare the lifecycle of
|meld
./gen/hood/meld
/lib/helm/kiln
/lib/hood/kiln
/sys/clay
vere/lord.c
vere/serf.c
We will discuss Khan and Lick in detail during the next lesson, but here's a quick recap of their functionality.
Khan
Khan is the "control plane" and thread-runner vane. Its main purpose is to allow external applications to run threads via a Unix Socket and receive the result.
A socket↗ is an endpoint for data communications. In the runtime, it is implemented by conn.c
, the runtime counterpart to %khan
.
Lick
Lick manages IPC ports, and the communication between Urbit applications and POSIX applications via these ports. Other vanes and applications ask Lick to open an IPC port, notify it when something is connected or disconnected, and transfer data between itself and the Unix application.
Debugging the Runtime
To conclude today's material, we would like to briefly demonstrate several debugging principles with VereAres.
printf
fprintf
-based output should be done using fprintf()
to stderr
. Use both \n
and \r
to achieve line feed (move cursor down one line) and carriage return (move it to the left). You can also use u3l_log
which does not require \r\n
, but should not be used in cases where the I/O drivers have not yet been initialized or can no longer be relied upon, e.g. crashing or shutdown.
gdb
For C, make heavy use of
gdb
.lldb
is far worse thangdb
for debugging Urbit, so it's worth developing on a Linux box even if that meansssh
ing into a server. (~wicdev-wisryt)
gdb
works best when you build with debugging symbols.
bazel build :urbit --compilation_mode=dbg- OR -bazel build :urbit --copt=-DU3_CPU_DEBUG- OR -bazel build :urbit --copt=-DU3_MEM_DEBUG
When using GDB, before attaching to the process you should set the following:
set follow-fork-mode childhandle SIGSEGV nostop noprint
If you are debugging jets or the serf, then you want to attach to the serf at urbit-worker
.
gdb --args ./bazel-bin/pkg/vere/urbit --args zod- OR -gdb attach <PID>
break jet_file:jet_name
Valgrind
Valgrind↗ is a memory profiling tool for diagnosing memory usage and memory leaks. As a memory tracking tool, Valgrind uses several times more memory than the native application requires.
The possible expedient to make it viable is:
- Decrease the loom size with
--loom
(e.g.--loom 29
or something similarly constrained).
urbit/urbit
#5161 has some discussion about using Valgrind. (The memory leak in question was ultimately resolved by #5614↗, which presents an instructive insight.)
Changing the Serf
The process call to invoke the serf is hard-coded in the king. To run Sword (née Ares) as a serf, for instance, you need to change the protocol
entry in vere/lord.c:u3_lord_init()
to the Sword executable as a full-path string literal.
Exercises
- Turn on
LORD_TRACE_CUE
and boot a new fake ship. - Examine the new
/ted/runtime-version
thread which is in the latest release (412 K). How does it work? Trace the types and values back through to see how the values are recovered from the runtime.