hashdist.core.run_job
— Job execution in controlled environment¶
Executes a set of commands in a controlled environment, determined by
a JSON job specification. This is used as the “build” section of build.json
,
the “install” section of artifact.json
, and so on.
The job spec may not completely specify the job environment because it
is usually a building block of other specs which may imply certain
additional environment variables. E.g., during a build, $ARTIFACT
and $BUILD
are defined even if they are never mentioned here.
Job specification¶
The job spec is a document that contains what’s needed to set up a controlled environment and run the commands. The idea is to be able to reproduce a job run, and hash the job spec. Example:
{
"import" : [
{"ref": "BASH", "id": "virtual:bash"},
{"ref": "MAKE", "id": "virtual:gnu-make/3+"},
{"ref": "ZLIB", "id": "zlib/2d4kh7hw4uvml67q7npltyaau5xmn4pc"},
{"ref": "UNIX", "id": "virtual:unix"},
{"ref": "GCC", "id": "gcc/jonykztnjeqm7bxurpjuttsprphbooqt"}
],
"commands" : [
{"chdir": "src"},
{"prepend_path": "FOOPATH", "value": "$ARTIFACT/bin"},
{"set": "INCLUDE_FROB", "value": "0"},
{"cmd": ["pkg-config", "--cflags", "foo"], "to_var": "CFLAGS"},
{"cmd": ["./configure", "--prefix=$ARTIFACT", "--foo-setting=$FOO"]}
{"cmd": ["bash", "$in0"],
"inputs": [
{"text": [
"["$RUN_FOO" != "" ] && ./foo"
"make",
"make install"
]}
}
],
}
Job spec root node¶
The root node is also a command node, as described below, but has two extra allowed keys:
- import:
The artifacts needed in the environment for the run. After the job has run they have no effect (i.e., they do not affect garbage collection or run-time dependencies of a build, for instance). The list is ordered and earlier entries are imported before latter ones.
- id: The artifact ID. If the value is prepended with
"virtual:"
, the ID is a virtual ID, used so that the real one does not contribute to the hash. See section on virtual imports below. - ref: A name to use to inject information of this dependency
into the environment. Above,
$ZLIB_DIR
will be the absolute path to thezlib
artifact, and$ZLIB_ID
will be the full artifact ID. This can be set to None in order to not set any environment variables for the artifact.
- id: The artifact ID. If the value is prepended with
When executing, the environment is set up as follows:
- Environment is cleared (
os.environ
has no effect)- The initial environment provided by caller (e.g.,
BuildStore
provides $ARTIFACT and $BUILD) is loaded- The import section is processed
- Commands executed (which may modify env)
Command node¶
The command nodes is essentially a script language, but lacks any form of control flow. The purpose is to control the environment, and then quickly dispatch to a script in a real programming language.
Also, the overall flow of commands to set up the build environment are typically generated by a pipeline from a package definition, and generating a text script in a pipeline is no fun.
See example above for basic script structure. Rules:
Every item in the job is either a cmd or a commands or a hit, i.e. those keys are mutually exclusive and defines the node type.
commands: Push a new environment and current directory to stack, execute sub-commands, and pop the stack.
cmd: The list is passed straight to
subprocess.Popen()
as is (after variable substitution). I.e., no quoting, no globbing.hit: executes the hit tool in-process. It acts like cmd otherwise, e.g., to_var works.
chdir: Change current directory, relative to current one (same as modifying PWD environment variable)
set, prepend/append_path, prepend/append_flag: Change environment variables, inserting the value specified by the value key, using variable substitution as explained below. set simply overwrites variable, while the others modify path/flag-style variables, using the os.path.patsep for prepend/append_path and a space for prepend/append_flag. NOTE: One can use nohash_value instead of value to avoid the value to enter the hash of a build specification.
files specifies files that are dumped to temporary files and made available as $in0, $in1 and so on. Each file has the form
{typestr: value}
, where typestr means:
text
: value should be a list of strings which are joined by newlinesstring
: value is dumped verbatim to filejson
: value is any JSON document, which is serialized to the filestdout and stderr will be logged, except if to_var or append_to_file is present in which case the stdout is capture to an environment variable or redirected in append-mode to file, respectively. (In the former case, the resulting string undergoes strip(), and is then available for the following commands within the same scope.)
Variable substitution is performed the following places: The cmd, value of set etc., chdir argument, stdout_to_file. The syntax is
$CFLAGS
and${CFLAGS}
.\$
is an escape for$
,\
is an escape for\
, other escapes not currently supported and\
will carry through unmodified.
For the hit tool, in addition to what is listed in hit
--help
, the following special command is available for interacting
with the job runner:
hit logpipe HEADING LEVEL
: Creates a new Unix FIFO and prints its name to standard output (it will be removed once the job terminates). The job runner will poll the pipe and print anything written to it nicely formatted to the log with the given heading and log level (the latter is one ofDEBUG
,INFO
,WARNING
,ERROR
).
Note
hit
is not automatically available in the environment in general
(in launched scripts etc.), for that, see hashdist.core.hit_recipe
.
hit logpipe
is currently not supported outside of the job spec
at all (this could be supported through RPC with the job runner, but the
gain seems very slight).
Virtual imports¶
Some times it is not desirable for some imports to become part of the hash.
For instance, if the cp
tool is used in the job, one is normally
ready to trust that the result wouldn’t have been different if a newer
version of the cp
tool was used instead.
Virtual imports, such as virtual:unix
in the example above, are
used so that the hash depends on a user-defined string rather than the
artifact contents. If a bug in cp
is indeed discovered, one can
change the user-defined string (e.g, virtual:unix/r2
) in order to
change the hash of the job desc.
Note
One should think about virtual dependencies merely as a tool that gives the user control (and responsibility) over when the hash should change. They are not the primary mechanism for providing software from the host; though software from the host will sometimes be specified as virtual dependencies.
Reference¶
-
class
hashdist.core.run_job.
CommandTreeExecution
(logger, temp_dir=None, debug=False, debug_shell='/bin/bash')¶ Class for maintaining state (in particular logging pipes) while executing script. Note that the environment is passed around as parameters instead.
Executing
run()
multiple times amounts to executing different variable scopes (but with same logging pipes set up).Parameters: logger : Logger
rpc_dir : str
A temporary directory on a local filesystem. Currently used for creating pipes with the “hit logpipe” command.
Methods
-
close
()¶ Removes log FIFOs; should always be called when one is done
-
dump_inputs
(inputs, node_pos)¶ Handles the ‘inputs’ attribute of a node by dumping to temporary files.
Returns: A dict with environment variables that can be used to update `env`, :
containing ``$in0``, ... :
-
logged_check_call
(args, env, stdout_to)¶ Similar to subprocess.check_call, but multiplexes input from stderr, stdout and any number of log FIFO pipes available to the called process into a single Logger instance. Optionally captures stdout instead of logging it.
-
run_hit
(args, env, stdout_to=None)¶ Run
hit
in the same process.But do not emit INFO-messages from sub-command unless level is DEBUG.
-
run_node
(node, env, node_pos)¶ Executes a script node and its children
Parameters: node : dict
A command node
env : dict
The environment (will be modified). The PWD variable tracks working directory and should always be set on input.
node_pos : tuple
Tuple of the “path” to this command node; e.g., (0, 1) for second command in first group.
-
-
hashdist.core.run_job.
canonicalize_job_spec
(job_spec)¶ Returns a copy of job_spec with default values filled in.
Also performs a tiny bit of validation.
-
hashdist.core.run_job.
handle_imports
(logger, build_store, artifact_dir, virtuals, job_spec)¶ Sets up environment variables for a job. This includes $MYIMPORT_DIR, $MYIMPORT_ID, $ARTIFACT, $HDIST_IMPORT, $HDIST_IMPORT_PATHS.
Returns: env : dict
Environment containing HDIST_IMPORT{,_PATHS} and variables for each import.
script : list
Instructions to execute; imports first and the job_spec commands afterwards.
-
hashdist.core.run_job.
run_job
(logger, build_store, job_spec, override_env, artifact_dir, virtuals, cwd, config, temp_dir=None, debug=False)¶ Runs a job in a controlled environment, according to rules documented above.
Parameters: logger : Logger
build_store : BuildStore
BuildStore to find referenced artifacts in.
job_spec : document
See above
override_env : dict
Extra environment variables not present in job_spec, these will be added last and overwrite existing ones.
artifact_dir : str
The value $ARTIFACT should take after running the imports
virtuals : dict
Maps virtual artifact to real artifact IDs.
cwd : str
The starting working directory of the job. Currently this cannot be changed (though a
cd
command may be implemented in the future if necesarry)config : dict
Configuration from
hashdist.core.config
. This will be serialied and put into the HDIST_CONFIG environment variable for use byhit
.temp_dir : str (optional)
A temporary directory for use by the job runner. Files will be left in the dir after execution.
debug : bool
Whether to run in debug mode.
Returns: out_env: dict :
The environment after the last command that was run (regardless of scoping/nesting). If the job spec is empty (no commands), this will be an empty dict.
-
hashdist.core.run_job.
substitute
(x, env)¶ Substitute environment variable into a string following the rules documented above.
Raises KeyError if an unreferenced variable is not present in env (
$$
always raises KeyError)