Working with artifacts

An Artifact is a typed view into a directory. Creating, accessing, and modifying an artifact’s attributes corresponds to creating, accessing, and modifying files and subdirectories. By default, simple objects (integers, floating-point numbers, strings, bytestrings, True, False, and None), lists, namespaces-like objects, and multidimensional arrays are stored as CBOR files, and path-like objects (including other artifacts) are stored as symbolic links. Support for additional object and file types can be added on both a per-class and per-attribute basis.

An example artifact type definition:

import re
from collections import Counter
from pathlib import Path
from typing import Protocol
from artisan import Artifact

class BookStats(Artifact):
    ''' A sorted list of the words in a book and
    a corresponding list of occurrence counts. '''

    class Spec(Protocol):
        book: str; 'The path to a text file.'

    def __init__(self, spec: Spec) -> None:
        text = Path(spec.book).read_text()
        counter = Counter(re.findall('\w+', text))
        self.words = sorted(counter.keys()) # creates `words.cbor`
        self.counts = [counter[w] for w in self.words] # creates `counts.cbor`

Creating artifacts

Instantiating an artifact ensures the corresponding files exist.

from os import listdir
from types import SimpleNamespace as Ns

stats = BookStats(Ns(book='The Grammar of Graphics.txt'))
Path(stats) # => Path('/home/tutorial-author/BookStats_0000')
listdir(stats) # => ['_meta_.json', 'words.cbor', 'counts.cbor']

However, Artisan will try to avoid rebuilding artifacts it has already created.

BookStats(Ns(book="Why's Poignant Guide.txt")) # creates "BookStats_0001/"
BookStats(Ns(book="Why's Poignant Guide.txt")) # uses the existing "BookStats_0001/"

When Artisan builds an artifact, it stores metadata in a file called _meta_.json. When an artifact is instantiated, Artisan will search for a directory with a matching _meta_.json file in the active context’s root artifact directory. The search is recursive, but when a directory with a _meta_.json file is found, its subdirectories will not be searched. If a match is found, that artifact will be returned. Otherwise, Artisan will invoke the active context’s artifact builder to create a new one. The default artifact builder calls __init__ and logs metadata to _meta_.json, but custom builders can be used to delegate work to remote servers or log additional metadata.

> cat BookStats_0000/_meta_.json
{
  "spec": {
    "type": "BookStats",
    "book": "The Grammar of Graphics.txt"
  },
  "events": [
    {"timestamp": "2020-09-12T05:45:55.372295", "type": "Start"},
    {"timestamp": "2020-09-12T05:45:55.518389", "type": "Success"}
  ]
}

If the artifact’s specification has a _path_ attribute, the artifact will be located at that path.

stats = BookStats(Ns(_path_='stats/that-really-long-book', book='Sapiens.txt'))
stats._path_ # => Path('/home/tutorial-author/stats/that-really-long-book')

Paths can be provided as strings or path-like objects. The “~” character will be expanded to the home directory and the “@” character will be expanded to the root artifact directory.

Recovering existing artifacts

artisan.recover can be used to instantiate an artifact from a path rather than a specification:

artisan.recover(BookStats, 'BookStats_0000') # => <Stats for "The Grammar of Graphics">

The @ operator works as a shorthand:

# These two expressions are equivalent:
artisan.recover(BookStats, 'BookStats_0000')
BookStats @ 'BookStats_0000'

Reading and writing files

When an artifact is writing a file, Artisan tries calling every writer in the artifact type’s _writers_ list, in order, until one returns successfully. Writers should accept an extensionless path and a data object, write the data to the path, and return the appropriate file extension. To support concurrent reading and writing, files generated by writers are not moved into the artifact’s directory until after the writer returns. For performance, Artisan may skip writers whose data argument type does not match the object being stored.

When an artifact is reading a file, Artisan tries calling every reader in the artifact type’s _readers_ list, in order, until one returns successfully. Readers should be functions that accept a path and return an object representing the data stored at that path. For performance, Artisan may skip readers whose path argument type is annotated with an incompatible extension requirement, e.g. Annotated[Path, '.txt'] or Annotated[Path, '.jpg', '.jpeg']. The Annotated type constructor can be imported from the typing module in Python 3.9+ and the typing_extensions module in earlier versions.

from typing import Annotated
from .my_lib import AudioClip

def read_wav(path: Annotated[Path, '.wav']) -> AudioClip:
    return AudioClip.read(path)

def write_npy(path: Path, clip: AudioClip) -> str:
    clip.write(path)
    return '.wav'

class MusicBox(Artifact):
    _readers_ = [read_wav, *Artifact._readers_] # Try `read_wav`, then the default readers.
    _writers_ = [write_wav, *Artifact._writers_] # Try `write_wav`, then the default writers.

The default writers include

The default readers include

Files can also be written and read directly using an artifacts’ _path_ attribute:

import pickle

class PickleJar(Artifact):
    def __init__(self, spec: object) -> None:
        (self._path_ / 'cucumber.pkl').write_bytes(pickle.dumps('🥒'))
        (self / 'pepper.pkl').write_bytes(pickle.dumps('🌶️')) # This also works.

And per-attribute support for file types can be added by defining properties or other descriptors:

from PIL import Image

class PictureFrame(Artifact):
    @property
    def picture(self) -> Image:
        return Image.open(self / 'picture.png')

Attribute-access modes

Artifacts can be instantiated in “read-sync”, “read-async”, or “write” mode. In “read-sync” mode, attribute accesses will only return after the artifact has finished building. In “read-async” mode, attribute accesses will return as soon as a corresponding file or directory exists. In “write” mode, attribute accesses will return immediately, but a ProxyArtifactField will be returned if no corresponding file or directory is present. Additionally, in “write” mode, attribute assignments will create files.

Artifacts are instantiated in “read-sync” mode by default, but if the artifact’s specification has a _mode_ attribute, or if a mode argument is provided to artisan.recover, that mode will be used instead. Artifacts’ __init__ methods are always executed in “write” mode, so it is generally only necessary to specify _mode_ when “read-async” behavior is desired.

# This raises an `AttributeError`:
read_sync_artifact = BookStats(Ns(book='Soonish.txt'))
read_sync_artifact.about # `about` doesn't exist.

# This works:
write_artifact = BookStats(Ns(book='Soonish.txt', _mode_='write'))
write_artifact.about # => <ProxyArtifactField object>
write_artifact.about.the_author = 'Seems like a pretty nice guy.'
write_artifact.about.favorite_pages.append('Page 1: The best so far')
write_artifact.about.favorite_pages.append('Page 2 is also quite good...')
listdir(write_artifact.about) # => ['the_author.cbor', 'favorite_pages.cbor']

Dynamic artifacts

Artisan provides the DynamicArtifact class to represent artifacts with attribute names known only at runtime. DynamicArtifact accepts a single type argument indicating the type of its instances’ attributes. Omitting this argument indicates that the attributes may be of any type.

from artisan import DynamicArtifact
photos = DynamicArtifact[Path] @ '/opt/paris_photos/' # All attributes are paths.

Dynamic artifacts support MutableMapping methods (__len__, __iter__, __contains__, etc.), and item-access syntax can be used to access their attributes. Key iteration is guaranteed to be alphabetical.

list(photos) # => ['eiffel_tower', 'that_dog_with_the_beret', ...]
photos['eiffle_tower'] # => Path('/opt/paris_photos/eiffel_tower.png')