hashdist.core.build_store — Build artifact store

Principles

The build store is the very core of HashDist: Producing build artifacts identified by hash-IDs. It’s important to have a clear picture of just what the build store is responsible for and not.

Nix takes a pure approach where an artifact hash is guaranteed to identify the resulting binaries (up to anything inherently random in the build process, like garbage left by compilers). In contrast, HashDist takes a much more lenient approach where the strictness is configurable. The primary goal of HashDist is to make life simpler by reliably triggering rebuilds when software components are updated, not total control of the environment (in which case Nix is likely the better option).

The only concern of the build store is managing the result of a build. So the declared dependencies in the build-spec are not the same as “package dependencies” found in higher-level distributions; for instance, if a pure Python package has a NumPy dependency, this should not be declared in the build-spec because NumPy is not needed during the build; indeed, the installation can happen in parallel.

Artifact IDs

A HashDist artifact ID has the form name/hash, e.g., zlib/4niostz3iktlg67najtxuwwgss5vl6k4.

For the artifact paths on disk, a shortened form (4-char hash) is used to make things more friendly to the human user. If there is a collision, the length is simply increased for the one that comes later. Thus, the example above could be stored on disk as ~/.hit/opt/zlib/4nio, or ~/.hit/opt/zlib/1.2.7/4nios in the (rather unlikely) case of a collision. There is a symlink from the full ID to the shortened form. See also Discussion below.

Build specifications and inferring artifact IDs

The fundamental object of the build store is the JSON build specification. If you know the build spec, you know the artifact ID, since the latter is the hash of the former. The key is that both dependencies and sources are specified in terms of their hashes.

An example build spec:

{
    "name" : "<name of piece of software>",
    "description": "<what makes this build special>",
    "build": {
        "import" : [
             {"ref": "bash", "id": "virtual:bash"},
             {"ref": "make", "id": "virtual:gnu-make/3+"},
             {"ref": "zlib", "id": "zlib/1.2.7/fXHu+8dcqmREfXaz+ixMkh2LQbvIKlHf+rtl5HEfgmU"},
             {"ref": "unix", "id": "virtual:unix"},
             {"ref": "gcc", "id": "gcc/host-4.6.3/q0VSL7JmzH1P17meqITYc4kMbnIjIexrWPdlAlqPn3s", "before": ["virtual:unix"]},
         ],
         "commands" : [
             {"cmd": ["bash", "build.sh"]}
         ],
     },
     "sources" : [
         {"key": "git:c5ccca92c5f136833ad85614feb2aa4f5bd8b7c3"},
         {"key": "tar.bz2:RB1JbykVljxdvL07mN60y9V9BVCruWRky2FpK2QCCow", "target": "sources"},
         {"key": "files:5fcANXHsmjPpukSffBZF913JEnMwzcCoysn-RZEX7cM"}
     ],
}
name:
Should match [a-zA-Z0-9-_+]+.
version:
Should match [a-zA-Z0-9-_+]*.
build:
A job to run to perform the build. See hashdist.core.run_job for the documentation of this sub-document.
sources:
Sources are unpacked; documentation for now in ‘hit unpack-sources’

The build environment

See hashdist.core.execute_job for information about how the build job is executed. In addition, the following environment variables are set:

BUILD:
Set to the build directory. This is also the starting cwd of each build command. This directory may be removed after the build.
ARTIFACT:
The location of the final artifact. Usually this is the “install location” and should, e.g., be passed as the --prefix to ./configure-style scripts.

The build specification is available under $BUILD/build.json, and stdout and stderr are redirected to $BUILD/_hashdist/build.log. These two files will also be present in $ARTIFACT after the build.

Build artifact storage format

The presence of the ‘id’ file signals that the build is complete, and contains the full 256-bit hash.

More TODO.

Reference

class hashdist.core.build_store.BuildSpec(build_spec)

Wraps the document corresponding to a build.json

The document is wrapped in order to a) signal that is has been canonicalized, b) make the artifact id available under the artifact_id attribute.

class hashdist.core.build_store.BuildStore(temp_build_dir, artifact_root, gc_roots_dir, logger, create_dirs=False)

Manages the directory of build artifacts; this is usually the entry point for kicking off builds as well.

Parameters:

temp_build_dir : str

Directory to use for temporary builds (these may be removed or linger depending on keep_build passed to ensure_present()).

artifact_root : str

Root of artifacts, this will be prepended to artifact_path_pattern with os.path.join. While this could be part of artifact_path_pattern, the meaning is that garbage collection will never remove contents outside of this directory.

gc_roots_dir : str

Directory of symlinks to symlinks to artifacts. Artifacts reached through these will not be collected in garbage collection.

logger : Logger

Methods

static create_from_config(config, logger, **kw)

Creates a SourceCache from the settings in the configuration

Creates a symlink to an artifact (usually a ‘profile’)

The symlink can be placed anywhere the users wants to access it. In addition to the symlink being created, it is listed in gc_roots.

The symlink will be created atomically, any target file/symlink will be overwritten.

delete(artifact_id)

Deletes an artifact ID from the store. This is simply an rmtree, i.e., it is (at least currently) possible to delete an aborted build, a build in progress etc., as long as it is present in the right path.

This is the backend of the hit purge command.

Returns the path that was removed, or None if no path was present.

ensure_present(build_spec, config, extra_env=None, virtuals=None, keep_build='never', debug=False)

Builds an artifact (if it is not already present).

extra_env: dict (optional)
Extra environment variables to pass to the build environment. These are NOT hashed!
gc()

Run garbage collection, removing any unneeded artifacts.

For now, this doesn’t care about virtual dependencies. They’re not used at the moment of writing this; it would have to be revisited in the future.

make_artifact_dir(build_spec)

Makes a directory to put the result of the artifact build in. This does not register the artifact in the db (which should be done after the artifact is complete).

make_build_dir(build_spec)

Creates a temporary build directory

Just to get a nicer name than mkdtemp would. The caller is responsible for removal.

resolve(artifact_id)

Given an artifact_id, resolve the short path for it, or return None if the artifact isn’t built.

hashdist.core.build_store.assert_safe_name(x)

Raises a ValueError if x does not match [a-zA-Z0-9-_+]+.

Returns x

hashdist.core.build_store.canonicalize_build_spec(spec)

Puts the build spec on a canonical form + basic validation

See module documentation for information on the build specification.

Parameters:

spec : json-like

The build specification

Returns:

canonical_spec : json-like

Canonicalized and verified build spec

hashdist.core.build_store.shorten_artifact_id(artifact_id, length=12)

Shortens the hash part of the artifact_id to the desired length

hashdist.core.build_store.strip_comments(spec)

Strips a build spec (which should be in canonical format) of comments that should not affect hash

hashdist.core.build_store.unpack_sources(logger, source_cache, doc, target_dir)

Executes source unpacking from ‘sources’ section in build.json