[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Workflow management with GNU Guix
From: |
Ricardo Wurmus |
Subject: |
Re: Workflow management with GNU Guix |
Date: |
Mon, 16 May 2016 14:22:02 +0200 |
User-agent: |
mu4e 0.9.13; emacs 24.5.1 |
(Resending this as it could not be delivered.)
Ricardo Wurmus <address@hidden> writes:
> Hi Roel,
>
>> With GNU Guix we are able to install programs to our machines with an amazing
>> level of control over the dependency graph of the programs. We can now know
>> what code will run when we invoke a program. We can now know what the impact
>> of an upgrade will be. And we can now safely roll-back to previous states.
>>
>> What seems to be a common practice in research involving data analysis, is
>> running multiple programs in a chain to transform data from raw to specific.
>> This is often referred to as a "pipeline" or a "workflow". Because data sets
>> can be quite large in comparison to the computing power of our laptops, the
>> data analysis is performed on computing clusters instead of single machines.
>>
>> The usage of a pipeline/workflow is somewhat different from the package
>> construction, because we want to run the sequence of commands on different
>> data
>> sets (as opposed to running it on the same source code). Plus, I would like
>> to
>> integrate it with existing computing clusters that have a job scheduling
>> system
>> in place.
>>
>> The reason I think this should be possible with Guix is that it has
>> everything in place to do software deployment and run-time isolation
>> (containers). From there it is a small step to executing programs in an
>> automated way.
>>
>> So, I would like to propose a new Guix subcommand and an extension to
>> the package management language to add workflow management features.
>
> I probably don’t understand your idea well enough, but from what I
> understand it doesn’t really have much to do with packages (other than
> using them) and store manipulation per se (produced artifacts are not
> added to the store). Exactly what features of Guix do you want to build
> on?
>
> My perspective on pipelines is that they should be developed like any
> other software package, treating individual tools as you would treat
> libraries. This means that a pipeline would have a configuration step
> in which it checks for the paths of all tools it needs internally, and
> then use the full paths rather than assume all tools to be in a
> directory listed in the PATH variable.
>
> Distributing jobs to clusters would be the responsibility of the
> pipeline, e.g. by using DRMMA, which supports several resource
> management backends and has bindings for a wide range of programming
> languages.
>
>> Would this be a feature you are interested in adding to GNU Guix?
>
> Even if it wasn’t part of Guix itself, you could develop it separately
> and still add it as a Guix command, much like it is currently done for
> “guix web” (which I think should eventually be part of Guix).
>
>> I'm currently working on a proof-of-concept implementation that has three
>> record types/levels of abstraction:
>> <workflow>: Describes which <process>es should be run, and concerns itself
>> with
>> the order of execution.
>>
>> <process>: Describes what packages are needed to run the programs involved,
>> and its relationship to other processes. Processes take input
>> and
>> generate output much like the package construction process.
>>
>> <script>: Short and simple imperative instructions to perform a task.
>> They are
>> part of a <process>. Currently, my implementation generates a
>> shell
>> script that can be either Guile, Sh, Perl or Python.
>
> From that list it seems as if the only link to Guix is ensuring the
> environment contains required programs. This can be done right now with
> the help of manifests and profiles.
>
> I wonder if maybe we could add Guix as a package management backend to
> existing workflow specification systems (instead of the curiously
> popular and IMO barely adequate Conda, for example).
>
>> The subcommand I envision is:
>> guix workflow
>>
>> With primarily:
>> guix workflow --run=<name-of-workflow-definition>
>>
>> If you are interested in adding any form of workflow management to GNU Guix,
>> I
>> can elaborate on my proof-of-concept implementation, so we can work from
>> there.
>> (or throw everything out of the window and start from scratch ;-))
>
> Could you show us an example workflow?
>
> ~~ Ricardo