Urbit Docs
  • What is Urbit?
  • Get on Urbit
  • Build on Urbit
    • Contents
    • Environment Setup
    • Hoon School
      • 1. Hoon Syntax
      • 2. Azimuth (Urbit ID)
      • 3. Gates (Functions)
      • 4. Molds (Types)
      • 5. Cores
      • 6. Trees and Addressing
      • 7. Libraries
      • 8. Testing Code
      • 9. Text Processing I
      • 10. Cores and Doors
      • 11. Data Structures
      • 12. Type Checking
      • 13. Conditional Logic
      • 14. Subject-Oriented Programming
      • 15. Text Processing II
      • 16. Functional Programming
      • 17. Text Processing III
      • 18. Generic and Variant Cores
      • 19. Mathematics
    • App School I
      • 1. Arvo
      • 2. The Agent Core
      • 3. Imports and Aliases
      • 4. Lifecycle
      • 5. Cards
      • 6. Pokes
      • 7. Structures and Marks
      • 8. Subscriptions
      • 9. Vanes
      • 10. Scries
      • 11. Failure
      • 12. Next Steps
      • Appendix: Types
    • App School II (Full-Stack)
      • 1. Types
      • 2. Agent
      • 3. JSON
      • 4. Marks
      • 5. Eyre
      • 6. React app setup
      • 7. React app logic
      • 8. Desk and glob
      • 9. Summary
    • Core Academy
      • 1. Evaluating Nock
      • 2. Building Hoon
      • 3. The Core Stack
      • 4. Arvo I: The Main Sequence
      • 5. Arvo II: The Boot Sequence
      • 6. Vere I: u3 and the Serf
      • 7. Vere II: The Loom
      • 8. Vanes I: Behn, Dill, Kahn, Lick
      • 9. Vanes II: Ames
      • 10. Vanes III: Eyre, Iris
      • 11. Vanes IV: Clay
      • 12. Vanes V: Gall and Userspace
      • 13. Vanes VI: Khan, Lick
      • 14. Vanes VII: Jael, Azimuth
    • Runtime
      • U3
      • Conn.c Guide
      • How to Write a Jet
      • API Overview by Prefix
      • C in Urbit
      • Cryptography
      • Land of Nouns
    • Tools
      • Useful Links
      • JS Libraries
        • HTTP API
      • Docs App
        • File Format
        • Index File
        • Suggested Structure
    • Userspace
      • Command-Line App Tutorial
      • Remote Scry
      • Unit Tests
      • Software Distribution
        • Software Distribution Guide
        • Docket File
        • Glob
      • Examples
        • Building a CLI App
        • Debugging Wrapper
        • Host a Website
        • Serving a JS Game
        • Ship Monitoring
        • Styled Text
  • Urbit ID
    • What is Urbit ID?
    • Azimuth Data Flow
    • Life and Rift
    • Urbit HD Wallet
    • Advanced Azimuth Tools
    • Custom Roller Tutorial
    • Azimuth.eth Reference
    • Ecliptic.eth Reference
    • Layer 2
      • L2 Actions
      • L2 Rollers
      • L2 Roller HTTP RPC-API
      • L2 Transaction Format
  • Urbit OS
    • What is Urbit OS?
    • Base
      • Hood
      • Threads
        • Basics Tutorial
          • Bind
          • Fundamentals
          • Input
          • Output
          • Summary
        • HTTP API Guide
        • Spider API Reference
        • Strandio Reference
        • Examples
          • Child Thread
          • Fetch JSON
          • Gall
            • Poke Thread
            • Start Thread
            • Stop Thread
            • Take Facts
            • Take Result
          • Main-loop
          • Poke Agent
          • Scry
          • Take Fact
    • Kernel
      • Arvo
        • Cryptography
        • Move Trace
        • Scries
        • Subscriptions
      • Ames
        • Ames API Reference
        • Ames Cryptography
        • Ames Data Types
        • Ames Scry Reference
      • Behn
        • Behn API Reference
        • Behn Examples
        • Behn Scry Reference
      • Clay
        • Clay API Reference
        • Clay Architecture
        • Clay Data Types
        • Clay Examples
        • Clay Scry Reference
        • Filesystem Hierarchy
        • Marks
          • Mark Examples
          • Using Marks
          • Writing Marks
        • Using Clay
      • Dill
        • Dill API Reference
        • Dill Data Types
        • Dill Scry Reference
      • Eyre
        • EAuth
        • Eyre Data Types
        • Eyre External API
        • Eyre Internal API
        • Eyre Scry Reference
        • Low-Level Eyre Guide
        • Noun channels
      • Gall
        • Gall API Reference
        • Gall Data Types
        • Gall Scry Reference
      • Iris
        • Iris API Reference
        • Iris Data Types
        • Iris Example
      • Jael
        • Jael API Reference
        • Jael Data Types
        • Jael Examples
        • Jael Scry Reference
      • Khan
        • Khan API Reference
        • Khan Data Types
        • Khan Example
      • Lick
        • Lick API Reference
        • Lick Guide
        • Lick Examples
        • Lick Scry Reference
  • Hoon
    • Why Hoon?
    • Advanced Types
    • Arvo
    • Auras
    • Basic Types
    • Cheat Sheet
    • Cryptography
    • Examples
      • ABC Blocks
      • Competitive Programming
      • Emirp
      • Gleichniszahlenreihe
      • Islands
      • Luhn Number
      • Minimum Path Sum
      • Phone Letters
      • Restore IP
      • Rhonda Numbers
      • Roman Numerals
      • Solitaire Cipher
      • Water Towers
    • Generators
    • Hoon Errors
    • Hoon Style Guide
    • Implementing an Aura
    • Irregular forms
    • JSON
    • Limbs and wings
      • Limbs
      • Wings
    • Mips (Maps of Maps)
    • Parsing Text
    • Runes
      • | bar · Cores
      • $ buc · Structures
      • % cen · Calls
      • : col · Cells
      • . dot · Nock
      • / fas · Imports
      • ^ ket · Casts
      • + lus · Arms
      • ; mic · Make
      • ~ sig · Hints
      • = tis · Subject
      • ? wut · Conditionals
      • ! zap · Wild
      • Constants (Atoms and Strings)
      • --, == · Terminators
    • Sail (HTML)
    • Serialization
    • Sets
    • Standard Library
      • 1a: Basic Arithmetic
      • 1b: Tree Addressing
      • 1c: Molds and Mold-Builders
      • 2a: Unit Logic
      • 2b: List Logic
      • 2c: Bit Arithmetic
      • 2d: Bit Logic
      • 2e: Insecure Hashing
      • 2f: Noun Ordering
      • 2g: Unsigned Powers
      • 2h: Set Logic
      • 2i: Map Logic
      • 2j: Jar and Jug Logic
      • 2k: Queue Logic
      • 2l: Container from Container
      • 2m: Container from Noun
      • 2n: Functional Hacks
      • 2o: Normalizing Containers
      • 2p: Serialization
      • 2q: Molds and Mold-Builders
      • 3a: Modular and Signed Ints
      • 3b: Floating Point
      • 3c: Urbit Time
      • 3d: SHA Hash Family
      • 3e: AES encryption (Removed)
      • 3f: Scrambling
      • 3g: Molds and Mold-Builders
      • 4a: Exotic Bases
      • 4b: Text Processing
      • 4c: Tank Printer
      • 4d: Parsing (Tracing)
      • 4e: Parsing (Combinators)
      • 4f: Parsing (Rule-Builders)
      • 4g: Parsing (Outside Caller)
      • 4h: Parsing (ASCII Glyphs)
      • 4i: Parsing (Useful Idioms)
      • 4j: Parsing (Bases and Base Digits)
      • 4k: Atom Printing
      • 4l: Atom Parsing
      • 4m: Formatting Functions
      • 4n: Virtualization
      • 4o: Molds
      • 5a: Compiler Utilities
      • 5b: Macro Expansion
      • 5c: Compiler Backend & Prettyprinter
      • 5d: Parser
      • 5e: Molds and mold builders
      • 5f: Profiling support
    • Strings
    • The Engine Pattern
    • Udon (Markdown-esque)
    • Vases
    • Zuse
      • 2d(1-5): To JSON, Wains
      • 2d(6): From JSON
      • 2d(7): From JSON (unit)
      • 2e(2-3): Print & Parse JSON
      • 2m: Ordered Maps
  • Nock
    • What is Nock?
    • Decrement
    • Definition
    • Fast Hints and Jets
    • Implementations
    • Specification
  • User Manual
    • Contents
    • Running Urbit
      • Cloud Hosting
      • Home Servers
      • Runtime Reference
      • Self-hosting S3 Storage with MinIO
    • Urbit ID
      • Bridge Troubleshooting
      • Creating an Invite Pool
      • Get an Urbit ID
      • Guide to Factory Resets
      • HD Wallet (Master Ticket)
      • Layer 2 for planets
      • Layer 2 for stars
      • Proxies
      • Using Bridge
    • Urbit OS
      • Basics
      • Configuring S3 Storage
      • Dojo Tools
      • Filesystem
      • Shell
      • Ship Troubleshooting
      • Star and Galaxy Operations
      • Updates
Powered by GitBook

GitHub

  • Urbit ID
  • Urbit OS
  • Runtime

Resources

  • YouTube
  • Whitepaper
  • Awesome Urbit

Contact

  • X
  • Email
  • Gather
On this page
  • Revision Control
  • A Typed Filesystem
  • Marks
Edit on GitHub
  1. Urbit OS
  2. Kernel
  3. Clay

Clay Architecture

Clay is the primary filesystem for the Arvo operating system, which is the core of an urbit. The architecture of Clay is intrinsically connected with Arvo, but for this section we assume no knowledge of either Arvo or Urbit. We will point out only those features of Arvo that are necessary for an understanding of Clay, and we will do so only when they arise.

The first relevant feature of Arvo is that it is a deterministic system where input and output are defined as a series of events and effects. The state of Arvo is simply a pure function of its event log. None of the effects from an event are emitted until the event is entered in the log and persisted, either to disk or another trusted source of persistence, such as a Kafka cluster. Consequently, Arvo is a single-level store: everything in its state is persistent.

In a more traditional OS, everything in RAM can be erased at any time by power failure, and is always erased on reboot. Thus, a primary purpose of a filesystem is to ensure files persist across power failures and reboots. In Arvo, both power failures and reboots are special cases of suspending computation, which is done safely since our event log is already persistent. Therefore, Clay is not needed in Arvo for persistence. Why, then, do we have a filesystem? There are two answers to this question.

First, Clay provides a filesystem tree, which is a convenient user interface for some applications. Unix has the useful concept of virtual filesystems, which are used for everything from direct access to devices, to random number generators, to the /proc tree. It is easy and intuitive to read from and write to a filesystem tree.

Second, Clay has a distributed revision-control system baked into it. Traditional filesystems are not revision controlled, so userspace software -- such as git -- is written on top of them to do so. Clay natively provides the same functionality as modern DVCSes, and more.

Clay has two other unique properties that we'll cover later on: it supports typed data and is referentially transparent.

Revision Control

Every urbit has one or more desks, which are independently revision-controlled branches. Each desk contains its own mark definitions, apps, and so forth.

Traditionally, an Urbit ship has at least a %base desk, and usually a %landscape desk, among others. The %base desk has the kernel and base system software. The %landscape desk has software pertaining to the home screen. The %base desk is a fork of the %base desk of whichever ship you download system updates from - typically your sponsor, but theoretically may be any ship.

A desk is a series of numbered commits, the most recent of which represents the current state of the desk. A commit is composed of (1) an absolute time when it was created, (2) a list of zero or more parents, and (3) a map from paths to data.

Most commits have exactly one parent, but the initial commit on a desk may have zero parents, and merge commits have more than one parent.

The non-metadata is stored as a map of paths to data. It's worth noting that no constraints are put on this map, so, for example, both /a/b and /a/b/c could have data. This is impossible in a traditional Unix filesystem since it means that /a/b is both a file and a directory. Conventionally, the final element in the path is its mark -- much like a filename extension in Unix. Thus, /doc/readme.md in Unix is stored as /doc/readme/md in urbit.

The data is not stored directly in the map; rather, a hash of the data is stored, and we maintain a master blob store. Thus, if the same data is referred to in multiple commits (as, for example, when a file doesn't change between commits), only the hash is duplicated.

In the master blob store, we either store the data directly, or else we store a diff against another blob. The hash is dependent only on the data within and not on whether or not it's stored directly, so we may on occasion rearrange the contents of the blob store for performance reasons.

Recall that a desk is a series of numbered commits. Not every commit in a desk must be numbered. For example, if the %base desk has had 50 commits since %foo was forked from it, then a merge from %base to %foo will only add a single revision number, although the full commit history will be accessible by traversing the parentage of the individual commits.

We do guarantee that the first commit is numbered 1, commits are numbered consecutively after that (i.e. there are no "holes"), the topmost commit is always numbered, and every numbered commit is an ancestor of every later numbered commit.

There are three ways to refer to particular commits in the revision history. Firstly, one can use the revision number. Secondly, one can use any absolute time between the one numbered commit and the next (inclusive of the first, exclusive of the second). Thirdly, every desk has a map of labels to revision numbers. These labels may be used to refer to specific commits.

Additionally, Clay is a global filesystem, so data on other urbits is easily accessible the same way as data on our local urbit. In general, the path to a particular revision of a desk is /~urbit-name/desk-name/revision. Thus, to get /try/readme/md from revision 5 of the %base desk on ~sampel-sipnym, we refer to /~sampel-sipnym/base/5/try/readme/md. Clay's namespace is thus global and referentially transparent.

A Typed Filesystem

Since Clay is a general filesystem for storing data of arbitrary types, in order to revision control correctly it needs to be aware of types all the way through. Traditional revision control does an excellent job of handling source code, so for source code we act very similar to traditional revision control. The challenge is to handle other data similarly well.

For example, modern VCSs generally support "binary files", which are files for which the standard textual diffing, patching, and merging algorithms are not helpful. A "diff" of two binary files is just a pair of the files, "patching" this diff is just replacing the old file with the new one, and "merging" non-identical diffs is always a conflict, which can't even be helpfully annotated. Without knowing anything about the structure of a blob of data, this is the best we can do.

Often, though, "binary" files have some internal structure, and it is possible to create diff, patch, and merge algorithms that take advantage of this structure. An image may be the result of a base image with some set of operations applied. With algorithms aware of this set of operations, not only can revision control software save space by not having to save every revision of the image individually, these transformations can be made on parallel branches and merged at will.

Suppose Alice is tasked with touching up a picture, improving the color balance, adjusting the contrast, and so forth, while Bob has the job of cropping the picture to fit where it's needed and adding textual overlay. Without type-aware revision control, these changes must be made serially, requiring Alice and Bob to explicitly coordinate their efforts. With type-aware revision control, these operations may be performed in parallel, and then the two changesets can be merged programmatically.

Of course, even some kinds of text files may be better served by diff, patch, and merge algorithms aware of the structure of the files. Consider a file containing a pretty-printed JSON object. Small changes in the JSON object may result in rather significant changes in how the object is pretty-printed (for example, by addding an indentation level, splitting a single line into multiple lines).

A text file wrapped at 80 columns also reacts suboptimally with unadorned Hunt-McIlroy diffs. A single word inserted in a paragraph may push the final word or two of the line onto the next line, and the entire rest of the paragraph may be flagged as a change. Two diffs consisting of a single added word to different sentences may be flagged as a conflict. In general, prose should be diffed by sentence, not by line.

As far as we are aware, Clay is the first generalized, type-aware revision control system. We'll go into the workings of this system in some detail.

Marks

Central to a typed filesystem is the idea of file types. In Clay, we call these marks. See the Marks section for more details.

PreviousClay API ReferenceNextClay Data Types

Last updated 1 day ago