Profile specification layer

Nobody wants to use the core tools directly and copy and paste artifact IDs (unless they are debugging and developing packages). This layer is one example of an API that can be used to drive hit fetch, hit build and hit makeprofile. Skipping this layer is encouraged if it makes more sense for your application.

Included: The ability to programatically define a desired software profile/”stack”, and automatically download and build the packages with minimum hassle. Patterns for using the lower-level hit command (e.g., standardize on symlink-based artifact profiles).

Excluded: Any use of package metadata or a central package repository to automatically resolve dependencies. (Some limited use of metadata to get software versions and so on may still be included.)

This layer can be used in two modes:

  • As an API to help implementing the final UI
  • Directly by power-users who don’t mind manually specifying everything to great detail

The API will be demonstrated by an example from the latter usecase.

Package class

At the basic level, we provide utilites that knows how to build packages and inject dependencies. Under the hood this happens by generating the necesarry JSON files (including the build setup, which is the hard part) and calling hit build and hit makeprofile.

Note

This has some overlap with Buildout. We should investigate using the Buildout API for the various package builders.

Warning

A lot in the below block is overly simplified in terms of what’s required for each package to build. Consider it a sketch.

Assume one creates the following “profile-script” where pretty much everything is done manually:

import hashdist
from hashdist import package as pkg

from_host = pkg.UseFromHostPackage(['gcc', 'python', 'bash'])

ATLAS = pkg.ConfigureMakeInstallPackage('http://downloads.sourceforge.net/project/math-atlas/Stable/3.10.0/atlas3.10.0.tar.bz2',
                                        build_deps=dict(gcc=from_host, bash=from_host))
numpy = pkg.DistutilsPackage('git://github.com/numpy/numpy.git',
                             build_deps=dict(python=from_host, gcc=from_host, blas=ATLAS),
                             run_deps=dict(python=from_host, blas=ATLAS),
                             ATLAS=ATLAS.path('lib'),
                             CFLAGS=['-O0', '-g'])
profile = hashdist.Profile([numpy])
hashdist.command_line(profile)

Everything here is lazy (one instantiates descriptors of packages only); each package object is immutable and just stores information about what it is and describes the dependency DAG. E.g., ATLAS.path('lib') doesn’t actually resolve any paths, it just returns a symbolic object which during the build will be able to resolve the path.

Running the script produces a command-line with several options, a typical run would be:

python theprofilescript.py update ~/mystack

This:

  1. Walks the dependency DAG and for each component generates a build.json and calls hit build, often hitting the cache
  2. Builds a profile and does ln -sf to atomically update ~/mystack (which is a symlink).

Package repositories

Given the above it makes sense to then make APIs which are essentially package object factories, and which are aware of various package sources. Like before, everything should be lazy/descriptive. Sketch:

import hashdist

# the package environment has a list of sources to consider for packages;
# which will be searched in the order provided
env = hashdist.PackageEnvironment(sources=[
    hashdist.SystemPackageProvider('system'),
    hashdist.SpkgPackageProvider('qsnake', '/home/dagss/qsnake/spkgs'),
    hashdist.PyPIPackageProvider('pypi', 'http://pypi.python.org')
])

# The environment also stores default arguments. env is immutable, so we
# modify by making a copy
env = env.copy_with(CFLAGS=['-O0', '-g'])

# Set up some compilers; insist that they are found on the 'system' source
# (do not build them)
intel = env.pkgs.intel_compilers(from='system')
gcc = env.pkgs.gnu_compilers(from='system')

# env.pkgs.__getattr__ "instantiates" software. The result is simply a symbolic
# node in a build dependency graph; nothing is resolved until an actual build
# is invoked
blas = env.pkgs.reference_blas(compiler=intel)
# or: blas = env.pkgs.ATLAS(version='3.8.4', compiler=intel)
# or: blas = env.pkgs.ATLAS(version='3.8.4', from='system',
#                           libpath='/sysadmins/stupid/path/for/ATLAS')

python = env.pkgs.python()
petsc = env.pkgs.petsc(from='qsnake', blas=blas, compiler=intel)
petsc4py = env.pkgs.petsc4py(from='qsnake', petsc=petsc, compiler=gcc, python=python)
numpy = env.pkgs.numpy(python=python, blas=blas, compiler=intel, CFLAGS='-O2')
jinja2 = env.pkgs.jinja2(python=python)

profile = hashdist.profile([python, petsc, numpy, jinja2])
hashdist.command_line(profile)