b2luigi
Contents
b2luigi#
b2luigi
— bringing batch 2 luigi!
b2luigi
is a helper package for luigi
for scheduling large luigi
workflows on a batch system.
It is as simple as
import b2luigi
class MyTask(b2luigi.Task):
def output(self):
return b2luigi.LocalTarget("output_file.txt")
def run(self):
with self.output().open("w") as f:
f.write("This is a test\n")
if __name__ == "__main__":
b2luigi.process(MyTask(), batch=True)
Jump right into it with out Quick Start.
If you have never worked with luigi
before, you may want to have a look into the luigi documentation.
But you can learn most of the nice features also from this documentation!
Attention
The API of b2luigi
is still under construction.
Please remember this when using the package in production!
Why not use the already created batch tasks?#
luigi
already contains a large set of tasks for scheduling and monitoring batch jobs 1.
But for thousands of tasks in very large projects with different task-defining libraries, you have some problems:
You want to run many (many many!) batch jobs in parallel In other
luigi
batch implementations, for every running batch job you also need a running task that monitors it. On most of the systems, the maximal number of processes is limited per user, so you will not be able to run more batch jobs than this. But what do you do if you have thousands of tasks to do?You have already a large set of luigi tasks in your project In other implementations you either have to override a
work
function (and you are not allowed to touch therun
function) or they can only run an external command, which you need to define. The first approach plays not well when mixing non-batch and batch task libraries and the second has problems when you need to pass complex arguments to the external command (via command line).You do not know which batch system you will run on Currently, the batch tasks are mostly defined for a specific batch system. But what if you want to switch from AWS to Azure? From LSF to SGE?
Entering b2luigi
, which tries to solve all this (but was heavily inspired by the previous implementations):
You can run as many tasks as your batch system can handle in parallel! There will only be a single process running on your submission machine.
No need to rewrite your tasks! Just call them with
b2luigi.process(.., batch=True)
or withpython file.py --batch
and you are ready to go!Switching the batch system is just a single change in a config file or one line in python. In the future, there will even be an automatic discovery of the batch system to use.
Is this the only thing I can do with b2luigi
?#
As b2luigi
should help you with large luigi
projects, we have also included some helper functionalities for
luigi
tasks and task handling.
b2luigi
task is a super-hero version of luigi
task, with simpler handling for output and input files.
Also, we give you working examples and best-practices for better data management and how to accomplish your goals,
that we have learned with time.
Why are you still talking, lets use it!#
Have a look into the Quick Start.
You can also start reading the API Documentation or the code on GitHub.
If you find any bugs or want to improve the documentation, please send me a pull request.
This project is in beta. Please be extra cautious when using in production mode. You can help me by working with one of the todo items described in Development and TODOs.
Content#
The name#
b2luigi
stands for multiple things at the same time:
It brings batch to (2)
luigi
.It helps you with the bread and butter work in
luigi
(e.g. proper data management)It was developed for the Belle II experiment.
The team#
- Current developer and maintainer
The Belle II Collaboration (belle2)
- Original author
Nils Braun (nils-braun)
- Former developer and maintainer
Michael Eliachevitch (meliache)
- Features, fixing, help and testing
Felix Metzner (FelixMetzner)
Patrick Ecker (eckerpatrick)
Jochen Gemmler
Maximilian Welsch (welschma)
Kilian Lieret (klieret)
Sviatoslav Bilokin (bilokin)
Phil Grace (philiptgrace)
Anselm Baur (anselmbaur)
Moritz Bauer (sognetic)
Matthias Schnepf (mschnepf)
Artur Gottmann (ArturAkh)
Caspar Schmitt (schmitca)
Marcel Hohmann (MarcelHoh)
Giacomo De Pietro (GiacomoXT)
Alex Heidelbach (AlexanderHeidelbach)
Tristan Fillinger (0ctagon)
Jonas Eppelt (JonasEppelt)
- Stolen ideas