Working with artifacts¶
An Artifact
is a typed view into a directory. Creating, accessing, and
modifying an artifact’s attributes corresponds to creating, accessing, and
modifying files and subdirectories. By default, simple objects (integers,
floating-point numbers, strings, bytestrings, True
, False
, and None
),
lists, namespaces-like objects, and multidimensional arrays are stored as CBOR files, and path-like objects (including other artifacts)
are stored as symbolic links. Support for additional object and file types can
be added on both a per-class and per-attribute basis.
An example artifact type definition:
import re
from collections import Counter
from pathlib import Path
from typing import Protocol
from artisan import Artifact
class BookStats(Artifact):
''' A sorted list of the words in a book and
a corresponding list of occurrence counts. '''
class Spec(Protocol):
book: str; 'The path to a text file.'
def __init__(self, spec: Spec) -> None:
text = Path(spec.book).read_text()
counter = Counter(re.findall('\w+', text))
self.words = sorted(counter.keys()) # creates `words.cbor`
self.counts = [counter[w] for w in self.words] # creates `counts.cbor`
Creating artifacts¶
Instantiating an artifact ensures the corresponding files exist.
from os import listdir
from types import SimpleNamespace as Ns
stats = BookStats(Ns(book='The Grammar of Graphics.txt'))
Path(stats) # => Path('/home/tutorial-author/BookStats_0000')
listdir(stats) # => ['_meta_.json', 'words.cbor', 'counts.cbor']
However, Artisan will try to avoid rebuilding artifacts it has already created.
BookStats(Ns(book="Why's Poignant Guide.txt")) # creates "BookStats_0001/"
BookStats(Ns(book="Why's Poignant Guide.txt")) # uses the existing "BookStats_0001/"
When Artisan builds an artifact, it stores metadata in a file called
_meta_.json
. When an artifact is instantiated, Artisan will search for a
directory with a matching _meta_.json
file in the active context’s root artifact directory. The search is recursive, but when a
directory with a _meta_.json
file is found, its subdirectories will not be
searched. If a match is found, that artifact will be returned. Otherwise,
Artisan will invoke the active context’s artifact builder to create a new one.
The default artifact builder calls __init__
and logs metadata to
_meta_.json
, but custom builders can be used to delegate work to remote
servers or log additional metadata.
> cat BookStats_0000/_meta_.json
{
"spec": {
"type": "BookStats",
"book": "The Grammar of Graphics.txt"
},
"events": [
{"timestamp": "2020-09-12T05:45:55.372295", "type": "Start"},
{"timestamp": "2020-09-12T05:45:55.518389", "type": "Success"}
]
}
If the artifact’s specification has a _path_
attribute, the artifact will be
located at that path.
stats = BookStats(Ns(_path_='stats/that-really-long-book', book='Sapiens.txt'))
stats._path_ # => Path('/home/tutorial-author/stats/that-really-long-book')
Paths can be provided as strings or path-like objects. The “~” character will be expanded to the home directory and the “@” character will be expanded to the root artifact directory.
Recovering existing artifacts¶
artisan.recover
can be used to instantiate an artifact from a path rather than
a specification:
artisan.recover(BookStats, 'BookStats_0000') # => <Stats for "The Grammar of Graphics">
The @
operator works as a shorthand:
# These two expressions are equivalent:
artisan.recover(BookStats, 'BookStats_0000')
BookStats @ 'BookStats_0000'
Reading and writing files¶
When an artifact is writing a file, Artisan tries calling every writer in the
artifact type’s _writers_
list, in order, until one returns successfully.
Writers should accept an extensionless path and a data object, write the data to
the path, and return the appropriate file extension. To support concurrent
reading and writing, files generated by writers are not moved into the
artifact’s directory until after the writer returns. For performance, Artisan
may skip writers whose data argument type does not match the object being
stored.
When an artifact is reading a file, Artisan tries calling every reader in the
artifact type’s _readers_
list, in order, until one returns successfully.
Readers should be functions that accept a path and return an object representing
the data stored at that path. For performance, Artisan may skip readers whose
path argument type is annotated with an incompatible extension requirement,
e.g. Annotated[Path, '.txt']
or Annotated[Path, '.jpg', '.jpeg']
. The
Annotated
type constructor can be imported from the typing
module in Python
3.9+ and the typing_extensions
module in earlier versions.
from typing import Annotated
from .my_lib import AudioClip
def read_wav(path: Annotated[Path, '.wav']) -> AudioClip:
return AudioClip.read(path)
def write_npy(path: Path, clip: AudioClip) -> str:
clip.write(path)
return '.wav'
class MusicBox(Artifact):
_readers_ = [read_wav, *Artifact._readers_] # Try `read_wav`, then the default readers.
_writers_ = [write_wav, *Artifact._writers_] # Try `write_wav`, then the default writers.
The default writers include
artisan.write_object_as_cbor
, which creates CBOR files, andartisan.write_path
, which creates symbolic links.
The default readers include
artisan.read_cbor_file
, which returns a simple object, a namespace, an extensiblePersistentList
, or an extensible, lazily loadedPersistentArray
,artisan.read_text_file
, which reads a text file usingPath.read_text
,artisan.read_json_file
, which reads a JSON file usingjson.load
,artisan.read_numpy_file
, which reads a NumPy array file or a NumPy archive usingnumpy.load
, andartisan.read_opaque_file
, which returns an unrecognized file’s path so it can be read using a function from an appropriate library.
Files can also be written and read directly using an artifacts’ _path_
attribute:
import pickle
class PickleJar(Artifact):
def __init__(self, spec: object) -> None:
(self._path_ / 'cucumber.pkl').write_bytes(pickle.dumps('🥒'))
(self / 'pepper.pkl').write_bytes(pickle.dumps('🌶️')) # This also works.
And per-attribute support for file types can be added by defining properties or other descriptors:
from PIL import Image
class PictureFrame(Artifact):
@property
def picture(self) -> Image:
return Image.open(self / 'picture.png')
Attribute-access modes¶
Artifacts can be instantiated in “read-sync”, “read-async”, or “write” mode. In
“read-sync” mode, attribute accesses will only return after the artifact has
finished building. In “read-async” mode, attribute accesses will return as soon
as a corresponding file or directory exists. In “write” mode, attribute accesses
will return immediately, but a ProxyArtifactField
will be returned if no
corresponding file or directory is present. Additionally, in “write” mode,
attribute assignments will create files.
Artifacts are instantiated in “read-sync” mode by default, but if the artifact’s
specification has a _mode_
attribute, or if a mode
argument is provided to
artisan.recover
, that mode will be used instead. Artifacts’ __init__
methods
are always executed in “write” mode, so it is generally only necessary to
specify _mode_
when “read-async” behavior is desired.
# This raises an `AttributeError`:
read_sync_artifact = BookStats(Ns(book='Soonish.txt'))
read_sync_artifact.about # `about` doesn't exist.
# This works:
write_artifact = BookStats(Ns(book='Soonish.txt', _mode_='write'))
write_artifact.about # => <ProxyArtifactField object>
write_artifact.about.the_author = 'Seems like a pretty nice guy.'
write_artifact.about.favorite_pages.append('Page 1: The best so far')
write_artifact.about.favorite_pages.append('Page 2 is also quite good...')
listdir(write_artifact.about) # => ['the_author.cbor', 'favorite_pages.cbor']
Dynamic artifacts¶
Artisan provides the DynamicArtifact
class to represent artifacts with
attribute names known only at runtime. DynamicArtifact
accepts a single type
argument indicating the type of its instances’ attributes. Omitting this
argument indicates that the attributes may be of any type.
from artisan import DynamicArtifact
photos = DynamicArtifact[Path] @ '/opt/paris_photos/' # All attributes are paths.
Dynamic artifacts support MutableMapping
methods (__len__
, __iter__
,
__contains__
, etc.), and item-access syntax can be used to access their
attributes. Key iteration is guaranteed to be alphabetical.
list(photos) # => ['eiffel_tower', 'that_dog_with_the_beret', ...]
photos['eiffle_tower'] # => Path('/opt/paris_photos/eiffel_tower.png')