Writing Marks
Here we'll walk through a practical example of writing a mark file.
We'll create a mark for CSV (comma separated values) files, a simple format for storing tabular data in a text file.
CSV files separate fields with commas and rows with line breaks. They look something like:
foo,bar,baz
blah,blah,blah
1,2,3
There is a little complexity surrounding special characters in fields and line endings, but otherwise the only other rule is that all rows must have the same number of fields. You can refer to RFC4180 on the IETF website for more details.
We'll represent such a structure in Hoon as a (list (list @t))
like:
[['foo' 'bar' 'baz' ~] ['blah' 'blah' 'blah' ~] ['1' '2' '3' ~] ~]
We could perhaps create the type with a $|
rune to include row-length validation in the mold itself, but a (list (list @t))
is simpler for demonstrative purposes.
A simple mark
Let's begin with the simplest mark file:
|_ csv=(list (list @t))
++ grab
|%
++ noun (list (list @t))
--
++ grow
|%
++ noun csv
--
++ grad %noun
--
The door takes a (list (list @t))
as its sample, and we've given it a face of csv
so we can easily reference it. Note its face could be anything, it needn't be the name of our mark. When we're doing something with data that has a CSV mark like converting it to another mark or creating a diff, this is where our data will reside.
Next we have the +grab
arm of our door, which contains a core with arms for converting to our mark from other marks. We've given it one arm for the %noun
mark - the most generic mark which will take any $noun
. Our +noun
arm will simply clam whatever it's given with the (list (list @t))
$mold
.
Next is the +grow
arm which does the inverse of +grab
, converting from our mark to another mark. We've also given it a +noun
arm, this time it will simply return the door's sample named csv
, which is of course already a $noun
.
Note that the +noun
arm is mandatory in +grab
. Clay cannot build a mark core without it. Conversion arms for any other marks apart from %noun
are optional.
Finally we have the +grad
arm. This arm specifies functions for revision control like creating diffs, patching files and so on. In our case, rather than writing all those functions, we've just delegated those tasks to the noun mark. We can do this because we've specified conversion routines to and from the noun mark in our +grow
and +grab
arms. When we modify a file with a csv mark, Clay will convert our data to a noun mark, execute the necessary +grad
functions from the noun mark file, and then convert it back to a CSV mark again.
So now we have a valid CSV mark file. If we save this as csv.hoon
in the /mar
directory we could store CSV data in Clay. This may be sufficient for some applications, but what if we want to import a CSV file from Unix or elsewhere? In the next section, we'll look at conversions to and from a MIME mark to address this.
MIME conversions
The $mime
type represents raw data from Unix or elsewhere. For example, if a text file from Unix containing the word foo
were converted to a $mime
type in Urbit, it would look something like:
[/text/plain q=[p=3 q=7.303.014]]
/text/plain
is its MIME type and p.q
is the byte-length of q.q
, which is the data itself as an $atom
.
The MIME mark is used by Clay to store and convert $mime
data. It's an important mark for moving files from Unix to Urbit and vice versa. When you add a file to a desk, you have mounted to Unix and |commit
the change, Clay will first receive the file as a MIME mark, then convert from a MIME to whatever mark matches the file extension. For example, foo.txt
will be converted from %mime
to %txt
. Additionally, data fetched by Iris over HTTP will come in as a $mime-data:http
, which is an unvalidated form of $mime
that you may wish to convert to a MIME mark and then to another mark. Likewise with Eyre, some of the lower-level interfaces receive HTTP requests with $mime-data:http
in them.
So with the nature of the MIME mark hopefully now clear, the reason we want conversion methods to and from MIME in our CSV mark is so we can import CSV files from Unix and vice versa.
Since a CSV file on Unix will just be a long string with ASCII or UTF-8 encoding, we can treat q.q
in the $mime
as a $cord
, and thus write a parser to convert it to a (list (list @t))
. For this purpose, here's a library: csv-utils.hoon
, which you can view in full on the Examples page.
The library contains four functions:
+de-csv
- Parse a CSV$cord
to a(list (list @t))
.+en-csv
- Encode a(list (list @t))
as a CSV$cord
.+validate
- Check all rows of(list (list @t))
are the same length.+csv-join
- Ignore this for now, we'll use it later on.
The decoding and encoding arms use parsing functions from the Hoon standard library. It's not important to be familiar with parsing in Hoon for our purposes here, but you can have a look at the Parsing Guide in the Hoon documentation if you're interested. The important thing to note is that +de-csv
takes a valid CSV-format @t
and returns a (list (list @t))
, and +en-csv
does the reverse - it takes a (list (list @t))
and returns a CSV-format @t
.
Let's try the library in the dojo. After we've added it to /lib
and run |commit
, we can build the file:
> =csv-utils -build-file %/lib/csv-utils/hoon
...try decode a CSV-format @t
:
> (de-csv:csv-utils 'foo,bar,baz\0ablah,blah,blah\0a1,2,3')
~[<|foo bar baz|> <|blah blah blah|> <|1 2 3|>]
...and try encode a (list (list @t))
as a CSV-format @t
:
> (en-csv:csv-utils [['foo' 'bar' 'baz' ~] ['blah' 'blah' 'blah' ~] ['1' '2' '3' ~] ~])
'foo,bar,baz\0ablah,blah,blah\0a1,2,3\0a'
With that working, we can add an import for our library to our CSV mark defintion and add a +mime
arm to both our +grab
and +grow
arms:
/+ *csv-utils
|_ csv=(list (list @t))
++ grab
|%
++ mime |=((pair mite octs) (de-csv q.q))
++ noun
|= n=*
^- (list (list @t))
=/ result ((list (list @t)) n)
?> (validate result)
result
--
++ grow
|%
++ mime
?> (validate csv)
[/text/csv (as-octs:mimes:html (en-csv csv))]
++ noun
?> (validate csv)
csv
--
++ grad %noun
--
In +grab
we've added a +mime
arm to convert from a MIME mark to our CSV mark. It's a simple gate that takes a $mime
(specified as (pair mite octs)
to avoid conflict with the arm name), runs the data through the +de-csv
function and returns a (list (list @t))
of the CSV data.
We've also added a +mime
arm to +grow
for converting from our CSV mark to a MIME mark. We encode our (list (list @t))
csv
sample with our +en-csv
function and then run that through as-octs:mimes:html
to get a $octs
(so it has the byte-length). We also add the /text/csv
MIME type so it's a valid $mime
.
Additionally, we've used the +validate
function in a few places to make sure our CSV data has consistent row lengths.
If we save the above mark file as csv.hoon
in /mar
and |commit %base
, we should now be able to import CSV files into Urbit. Let's give it a go. In the root of our %base
desk, let's add a file named foo.csv
with the following contents:
foo,bar,baz
blah,blah,blah
1,2,3
If we now |commit %base
, we should see it's been successfully added:
> |commit %base
>=
+ /~zod/base/4/foo/csv
And if we try reading the file with the -read
thread:
> -read [%x our %base da+now /foo/csv]
~[<|foo bar baz|> <|blah blah blah|> <|1 2 3|>]
We can see our CSV mark has successfully converted our foo.csv
file to a (list (list @t))
when it was imported.
Let's try the other direction now. We can create a new bar.csv
files in the root of %base
from the dojo like so:
> */bar/csv ~[['abc' 'def' ~] ['ghi' 'jkl' ~]]
+ /~zod/base/5/bar/csv
And if we check it in the terminal on the Unix side we can see it's been correctly encoded:
> cat zod/base/bar.csv
abc,def
ghi,jkl
So now our CSV mark lets us move data in and out of Urbit. In the next section, we'll look at the +grad
arm in more detail.
+grad
+grad
So far we've just delegated +grad
functions to the noun mark, but now we'll look at writing our own.
For demonstrative purposes, we can just poach the algorithms used in the +grad
arm of the TXT mark and modify them to take our (list (list @t))
type instead of a $wain
. It's not the most efficient algorithm for a CSV file but it'll do the job.
Our diff format will be a (urge:clay (list @t))
, and we'll use some +differ
functions from zuse.hoon
like +loss
, +lusk
and +lurk
to produce diffs and apply patches.
The csv-utils.hoon library we imported also contains a +csv-join
function which we'll use in the +join
arm, just to save space here.
Here's the new CSV mark defintion:
/+ *csv-utils
|_ csv=(list (list @t))
++ grab
|%
++ mime |=((pair mite octs) (de-csv q.q))
++ noun
|= n=*
^- (list (list @t))
=/ result ((list (list @t)) n)
?> (validate result)
result
--
++ grow
|%
++ mime
?> (validate csv)
[/text/csv (as-octs:mimes:html (en-csv csv))]
++ noun
?> (validate csv)
csv
--
++ grad
|%
++ form %csv-diff
++ diff
|= bob=(list (list @t))
^- (urge:clay (list @t))
?> (validate csv)
?> (validate bob)
(lusk:differ csv bob (loss:differ csv bob))
++ pact
|= dif=(urge:clay (list @t))
^- (list (list @t))
=/ result (lurk:differ csv dif)
?> (validate result)
result
++ join
|= $: ali=(urge:clay (list @t))
bob=(urge:clay (list @t))
==
^- (unit (urge:clay (list @t)))
(csv-join ali bob)
++ mash
|= $: [ship desk (urge:clay (list @t))]
[ship desk (urge:clay (list @t))]
==
^- (urge:clay (list @t))
~|(%csv-mash !!)
--
--
In our modified +grad
arm, we've replaced the noun delegation with a core containing five arms: +form
, +diff
, +pact
, +join
, and +mash
. These arms are all required for a valid +grad
if it's not delegated to another mark. We'll now look at each in detail.
+form
+form
++ form %csv-diff
+form
simply specifies the mark of the diff file that may be produced by other +grad
functions. If your diff is the same type as your mark, it could just specify itself like CSV. In our case our diff is a (urge:clay (list @t))
rather than a (list (list @t))
, so we need a separate mark file for the diff itself.
Here's another mark file which can be saved as csv-diff.hoon
in /mar
:
|_ dif=(urge:clay (list @t))
++ grab
|%
++ noun (urge:clay (list @t))
--
++ grow
|%
++ noun dif
--
++ grad %noun
--
It's very bare-bones, we just need it for our CSV mark to work. In our CSV mark, we've specified it as %csv-diff
in +form
.
+diff
+diff
++ diff
|= bob=(list (list @t))
^- (urge:clay (list @t))
?> (validate csv)
?> (validate bob)
(lusk:differ csv bob (loss:differ csv bob))
This arm produces the diff of two CSV files. The first CSV file will be given as the sample of the parent door, which if you'll recall we gave a face of csv
. The second CSV file will be given as the sample of the gate in +diff
, which we've named bob
here. We then just produce the diff of these two files and return it as the type of the mark specified in +form
, which in our case is (urge:clay (list @t))
for a %csv-diff
. Clay will use +diff
when a file is revised, so it doesn't have to store a whole new copy of the file each time it's modified.
+pact
+pact
++ pact
|= dif=(urge:clay (list @t))
^- (list (list @t))
=/ result (lurk:differ csv dif)
?> (validate result)
result
+pact
patches a CSV file with the given diff. Its gate takes a diff and applies it to the CSV given as the sample of the parent door (which we gave a face of csv
). If the patch succeeds, it will return a new CSV file - a valid (list (list @t))
. When we read a file that's been modified in Clay, Clay will apply all the diffs it has with +pact
and return the resulting file.
+join
+join
++ join
|= $: ali=(urge:clay (list @t))
bob=(urge:clay (list @t))
==
^- (unit (urge:clay (list @t)))
(csv-join ali bob)
The +join
arm merges two different diffs. It takes them both as the sample of its gate (which we've named ali
and bob
), and returns a new diff wrapped in a +unit
like (unit (urge:clay (list @t)))
. The +unit
will be ~
if the merge failed due to a conflict. This is used by Clay in some cases when desks are merged. If diff merges are not possible for your use case, you could just have it always return ~
.
+mash
+mash
++ mash
|= $: [ship desk (urge:clay (list @t))]
[ship desk (urge:clay (list @t))]
==
^- (urge:clay (list @t))
~|(%csv-mash !!)
This is like +join
except it forces a diff merge even if there's a conflict. Rather than returning a +unit
, it just returns the diff - a (urge:clay (list @t))
in our case. Also unlike +join
, it takes the $ship
and $desk
each diff came from as well as the diff itself.
The +mash
arm is not used by Clay in its file revision operations, so it's safe to just make it a dummy arm that crashes as we've done here. If you were to use it, it would likely just be used manually in an agent, thread or generator.
An example of its use would be the TXT mark, which includes a proper +mash
function that produces a diff with any conflicts annotated, though how you have +mash
handle conficts would depend on your use case. If there were no conflicts between the two diffs, it should produce the same diff as the +join
arm.
Conclusion
So there you have it, a fully functional mark for CSV files. A mark file can be as complex or as simple as you'd like, they're very flexible depending on your use case. Additional conversion methods can always be added as they're needed. For example, with just a few lines of code we could add arms for converting CSV files to JSON or TXT and vice versa.
In the next document, we'll look at building and using mark cores and mark conversion gates in our own code.
Last updated