hashdist.core.run_job — Job execution in controlled environment

Executes a set of commands in a controlled environment, determined by a JSON job specification. This is used as the “build” section of build.json, the “install” section of artifact.json, and so on.

The job spec may not completely specify the job environment because it is usually a building block of other specs which may imply certain additional environment variables. E.g., during a build, $ARTIFACT and $BUILD are defined even if they are never mentioned here.

Job specification

The job spec is a document that contains what’s needed to set up a controlled environment and run the commands. The idea is to be able to reproduce a job run, and hash the job spec. Example:

{
    "import" : [
        {"ref": "BASH", "id": "virtual:bash"},
        {"ref": "MAKE", "id": "virtual:gnu-make/3+"},
        {"ref": "ZLIB", "id": "zlib/2d4kh7hw4uvml67q7npltyaau5xmn4pc"},
        {"ref": "UNIX", "id": "virtual:unix"},
        {"ref": "GCC", "id": "gcc/jonykztnjeqm7bxurpjuttsprphbooqt"}
     ],
     "commands" : [
         {"chdir": "src"},
         {"prepend_path": "FOOPATH", "value": "$ARTIFACT/bin"},
         {"set": "INCLUDE_FROB", "value": "0"},
         {"cmd": ["pkg-config", "--cflags", "foo"], "to_var": "CFLAGS"},
         {"cmd": ["./configure", "--prefix=$ARTIFACT", "--foo-setting=$FOO"]}
         {"cmd": ["bash", "$in0"],
          "inputs": [
              {"text": [
                  "["$RUN_FOO" != "" ] && ./foo"
                  "make",
                  "make install"
              ]}
         }
     ],
}

Job spec root node

The root node is also a command node, as described below, but has two extra allowed keys:

import:

The artifacts needed in the environment for the run. After the job has run they have no effect (i.e., they do not affect garbage collection or run-time dependencies of a build, for instance). The list is ordered and earlier entries are imported before latter ones.

  • id: The artifact ID. If the value is prepended with "virtual:", the ID is a virtual ID, used so that the real one does not contribute to the hash. See section on virtual imports below.
  • ref: A name to use to inject information of this dependency into the environment. Above, $ZLIB_DIR will be the absolute path to the zlib artifact, and $ZLIB_ID will be the full artifact ID. This can be set to None in order to not set any environment variables for the artifact.

When executing, the environment is set up as follows:

  • Environment is cleared (os.environ has no effect)
  • The initial environment provided by caller (e.g., BuildStore provides $ARTIFACT and $BUILD) is loaded
  • The import section is processed
  • Commands executed (which may modify env)

Command node

The command nodes is essentially a script language, but lacks any form of control flow. The purpose is to control the environment, and then quickly dispatch to a script in a real programming language.

Also, the overall flow of commands to set up the build environment are typically generated by a pipeline from a package definition, and generating a text script in a pipeline is no fun.

See example above for basic script structure. Rules:

  • Every item in the job is either a cmd or a commands or a hit, i.e. those keys are mutually exclusive and defines the node type.

  • commands: Push a new environment and current directory to stack, execute sub-commands, and pop the stack.

  • cmd: The list is passed straight to subprocess.Popen() as is (after variable substitution). I.e., no quoting, no globbing.

  • hit: executes the hit tool in-process. It acts like cmd otherwise, e.g., to_var works.

  • chdir: Change current directory, relative to current one (same as modifying PWD environment variable)

  • set, prepend/append_path, prepend/append_flag: Change environment variables, inserting the value specified by the value key, using variable substitution as explained below. set simply overwrites variable, while the others modify path/flag-style variables, using the os.path.patsep for prepend/append_path and a space for prepend/append_flag. NOTE: One can use nohash_value instead of value to avoid the value to enter the hash of a build specification.

  • files specifies files that are dumped to temporary files and made available as $in0, $in1 and so on. Each file has the form {typestr: value}, where typestr means:

    • text: value should be a list of strings which are joined by newlines
    • string: value is dumped verbatim to file
    • json: value is any JSON document, which is serialized to the file
  • stdout and stderr will be logged, except if to_var or append_to_file is present in which case the stdout is capture to an environment variable or redirected in append-mode to file, respectively. (In the former case, the resulting string undergoes strip(), and is then available for the following commands within the same scope.)

  • Variable substitution is performed the following places: The cmd, value of set etc., chdir argument, stdout_to_file. The syntax is $CFLAGS and ${CFLAGS}. \$ is an escape for $, \ is an escape for \, other escapes not currently supported and \ will carry through unmodified.

For the hit tool, in addition to what is listed in hit --help, the following special command is available for interacting with the job runner:

  • hit logpipe HEADING LEVEL: Creates a new Unix FIFO and prints its name to standard output (it will be removed once the job terminates). The job runner will poll the pipe and print anything written to it nicely formatted to the log with the given heading and log level (the latter is one of DEBUG, INFO, WARNING, ERROR).

Note

hit is not automatically available in the environment in general (in launched scripts etc.), for that, see hashdist.core.hit_recipe. hit logpipe is currently not supported outside of the job spec at all (this could be supported through RPC with the job runner, but the gain seems very slight).

Virtual imports

Some times it is not desirable for some imports to become part of the hash. For instance, if the cp tool is used in the job, one is normally ready to trust that the result wouldn’t have been different if a newer version of the cp tool was used instead.

Virtual imports, such as virtual:unix in the example above, are used so that the hash depends on a user-defined string rather than the artifact contents. If a bug in cp is indeed discovered, one can change the user-defined string (e.g, virtual:unix/r2) in order to change the hash of the job desc.

Note

One should think about virtual dependencies merely as a tool that gives the user control (and responsibility) over when the hash should change. They are not the primary mechanism for providing software from the host; though software from the host will sometimes be specified as virtual dependencies.

Reference

class hashdist.core.run_job.CommandTreeExecution(logger, temp_dir=None, debug=False, debug_shell='/bin/bash')

Class for maintaining state (in particular logging pipes) while executing script. Note that the environment is passed around as parameters instead.

Executing run() multiple times amounts to executing different variable scopes (but with same logging pipes set up).

Parameters:

logger : Logger

rpc_dir : str

A temporary directory on a local filesystem. Currently used for creating pipes with the “hit logpipe” command.

Methods

close()

Removes log FIFOs; should always be called when one is done

dump_inputs(inputs, node_pos)

Handles the ‘inputs’ attribute of a node by dumping to temporary files.

Returns:

A dict with environment variables that can be used to update `env`, :

containing ``$in0``, ... :

logged_check_call(args, env, stdout_to)

Similar to subprocess.check_call, but multiplexes input from stderr, stdout and any number of log FIFO pipes available to the called process into a single Logger instance. Optionally captures stdout instead of logging it.

run_hit(args, env, stdout_to=None)

Run hit in the same process.

But do not emit INFO-messages from sub-command unless level is DEBUG.

run_node(node, env, node_pos)

Executes a script node and its children

Parameters:

node : dict

A command node

env : dict

The environment (will be modified). The PWD variable tracks working directory and should always be set on input.

node_pos : tuple

Tuple of the “path” to this command node; e.g., (0, 1) for second command in first group.

hashdist.core.run_job.canonicalize_job_spec(job_spec)

Returns a copy of job_spec with default values filled in.

Also performs a tiny bit of validation.

hashdist.core.run_job.handle_imports(logger, build_store, artifact_dir, virtuals, job_spec)

Sets up environment variables for a job. This includes $MYIMPORT_DIR, $MYIMPORT_ID, $ARTIFACT, $HDIST_IMPORT, $HDIST_IMPORT_PATHS.

Returns:

env : dict

Environment containing HDIST_IMPORT{,_PATHS} and variables for each import.

script : list

Instructions to execute; imports first and the job_spec commands afterwards.

hashdist.core.run_job.run_job(logger, build_store, job_spec, override_env, artifact_dir, virtuals, cwd, config, temp_dir=None, debug=False)

Runs a job in a controlled environment, according to rules documented above.

Parameters:

logger : Logger

build_store : BuildStore

BuildStore to find referenced artifacts in.

job_spec : document

See above

override_env : dict

Extra environment variables not present in job_spec, these will be added last and overwrite existing ones.

artifact_dir : str

The value $ARTIFACT should take after running the imports

virtuals : dict

Maps virtual artifact to real artifact IDs.

cwd : str

The starting working directory of the job. Currently this cannot be changed (though a cd command may be implemented in the future if necesarry)

config : dict

Configuration from hashdist.core.config. This will be serialied and put into the HDIST_CONFIG environment variable for use by hit.

temp_dir : str (optional)

A temporary directory for use by the job runner. Files will be left in the dir after execution.

debug : bool

Whether to run in debug mode.

Returns:

out_env: dict :

The environment after the last command that was run (regardless of scoping/nesting). If the job spec is empty (no commands), this will be an empty dict.

hashdist.core.run_job.substitute(x, env)

Substitute environment variable into a string following the rules documented above.

Raises KeyError if an unreferenced variable is not present in env ($$ always raises KeyError)