Note
Go to the end to download the full example code.
gbasf2
Analysis Task#
Hint
This example demonstrates how to submit a basf2
analysis task via gbasf2
.
Here, we do not rely on the self simulated and reconstructed data but use a MC dataset that is available on the grid as input.
In addition, this exercise explains every setting that can be set for a Basf2PathTask
that is submitted via gbasf2
.
To submit jobs via gbasf2
, the user will need to have a valid grid certificate.
In principle, b2luigi
will execute the gbasf2
setup with
which will also initialize the proxy. However, it is recommended to have an active proxy before submitting the job since otherwise the proxy password will be required for each submitting project.
gbasf2 Project Submission
The gbasf2
batch process takes the basf2
path returned by the
create_path
method of the task, saves it into a pickle file to the
disk and creates a wrapper steering file that executes the saved path.
Any basf2
variable aliases added in the basf2.Path()
or create_path
method are also stored in the pickle file. It then sends both the pickle
file and the steering file wrapper to the grid via the Belle II-specific
DIRAC-wrapper gbasf2
.
Project Status Monitoring
After the project submission, the gbasf
batch process regularly checks
the status of all the jobs belonging to a gbasf2
project returns a
success if all jobs had been successful, while a single failed job
results in a failed project. You can close a running b2luigi
process and
then start your script again and if a task with the same project name is
running, this b2luigi
gbasf2
wrapper will recognize that and instead of
resubmitting a new project, continue monitoring the running project.
Automatic Download of Datasets and Logs
If all jobs had been successful, it automatically downloads the output dataset and the log files from the job sandboxes and automatically checks if the download was successful before moving the data to the final location. On failure, it only downloads the logs.
Automatic Rescheduling of Failed Jobs
Whenever a job fails, gbasf2
reschedules it as long as the number of
retries is below the value of the setting gbasf2_max_retries
. It keeps
track of the number of retries in a local file in the log_file_dir
, so
that it does not change if you close b2luigi
and start it again. Of
course it does not persist if you remove that file or move to a
different machine.
Required settings for gbasf2
gbasf2_input_dataset
: String with the logical path of a dataset on the grid to use as an input to the task. You can provide multiple inputs by having multiple paths contained in this string, separated by commas without spaces. An alternative is to just instantiate multiple tasks with different input datasets, if you want to know in retrospect which input dataset had been used for the production of a specific output.gbasf2_input_dslist
: Alternatively togbasf2_input_dataset
, you can use this setting to provide a text file containing the logical grid path names, one per line.gbasf2_project_name_prefix
: A string with which yourgbasf2
project names will start. To ensure the project associate with each unique task (i.e. for each of luigi parameters) is unique, the uniqueb2luigi.Task.task_id
is hashed and appended to the prefix to create the actualgbasf2
project name. Should be below 22 characters so that the project name with the hash can remain under 32 characters.
Warning
The chosen dataset is only an example and might not produce exact results.
import basf2 as b2
import modularAnalysis as ma
import variables.utils as vu
import vertex as vx
import b2luigi
from b2luigi.basf2_helper import Basf2PathTask
class AnalysisTask(Basf2PathTask):
treeFit = b2luigi.BoolParameter(default=True)
event_type = b2luigi.Parameter(default="mumu")
result_dir = "results/AnalysisGbasf2"
batch_system = "gbasf2"
@property
def gbasf2_input_dslist(self):
return "charged.txt"
@property
def gbasf2_project_name_prefix(self):
# Keep projectname shorter than grid requirement
projectname = "Gbasf2Task"
return projectname[-21:]
@property
def gbasf2_release(self):
return "light-2409-toyger"
@property
def gbasf2_download_logs(self):
return False
# Don't use `add_to_output` here, but define the output explicitly.
def output(self):
return {"ntuple_gbasf2.root": b2luigi.LocalTarget("ntuple_gbasf2.root")}
def create_path(self):
main = b2.Path()
particle = "mu" if str(self.event_type) == "mumu" else "e"
ma.inputMdstList(
environmentType="default",
# Empty input file list, will be filled by the grid
filelist=[],
path=main,
)
# The rest of the steeringfile is identical to the local submission
# with the difference of the correct outputfile name.
# Fill final state particles
ma.fillParticleList(decayString=f"{particle}+:fsp", cut="[abs(dr) < 2.0] and [abs(dz) < 4.0]", path=main)
ma.fillParticleList(
decayString="K+:fsp", cut="[abs(dr) < 2.0] and [abs(dz) < 4.0] and [kaonID > 0.2]", path=main
)
# reconstruct J/psi -> l+ --
ma.reconstructDecay(
decayString=f"J/psi:ll -> {particle}+:fsp {particle}-:fsp", cut="[2.92 < M < 3.22]", path=main
)
# reconstruct B+ -> J/psi K+
ma.reconstructDecay(
decayString="B+:jpsik -> J/psi:ll K+:fsp", cut="[Mbc > 5.24] and [abs(deltaE) < 0.2]", path=main
)
# Run truth matching
ma.matchMCTruth(list_name="B+:jpsik", path=main)
# Perform tree fit if requested
if self.treeFit:
vx.treeFit(list_name="B+:jpsik", conf_level=0.00, massConstraint=["J/psi"], path=main)
# Define variables to save
variables = ["Mbc", "deltaE", "chiProb", "isSignal"]
variables += vu.create_aliases_for_selected(
list_of_variables=["InvM", "M"], decay_string="B+:jpsik -> ^J/psi:ll K+:fsp", prefix="jpsi"
)
# Write variables into ntuple
ma.variablesToNtuple(
decayString="B+:jpsik",
variables=variables,
filename="ntuple_gbasf2.root",
treename="Bp",
path=main,
storeEventType=False,
)
return main
if __name__ == "__main__":
b2luigi.process(AnalysisTask(), batch=True)
Optional but useful settings for gbasf2
gbasf2_release
: Defaults to the release of your currently set upbasf2
release. Set this if you want the jobs to use another release on the grid.gbasf2_max_retries
: Default to0
. Maximum number of times that each job in the project can be automatically rescheduled until the project is declared as failed.gbasf2_proxy_group
: Default to"belle"
. If provided, thegbasf2
wrapper will work with the customgbasf2
group, specified in this parameter. No need to specify this parameter in case of usual physics analysis at Belle II. If specified, one has to providegbasf2_project_lpn_path parameter
.gbasf2_project_lpn_path
: Path to the LPN folder for a specified gbasf2 group. The parameter has no effect unless the gbasf2_proxy_group is used with non-default value.
It is further possible to append arbitrary command line arguments to the
gbasf2 submission command with the gbasf2_additional_params
setting. If
you want to blacklist a grid site, you can e.g. add
b2luigi.set_setting(
"gbasf2_additional_params", "--banned_site LCG.KEK.jp"
)
More settings can be found in the Gbasf2Process
section.
Hint
Changing the batch to gbasf2
means you also have to adapt
how you handle the output of your gbasf2
task in tasks depending on it,
because the output will not be a single root file anymore, but a
collection of root files, one for each file in the input data set, in a
directory with the base name of the root files
Hint
For gbasf2
submission, provide the input dataset as an empty
list since this will be automatically filled by
gbasf2_input_dataset
.