Remote Targets

Remote Targets#

To ease the work with files stored on the grid or local dCache instances, b2luigi provides an Target implementation for so-called remote targets. At this moment, two different remote file systems are supported: XRootD and WebDAV. Community contributions for different systems and protocols are very much welcome!

Hint

For both protocols to work, you need to set as environment variable.

X509_USER_PROXY points to the location of the user proxy, usually of the form /tmp/x509up_uXXXXX.

XRootD#

In the background, the implementation for the XRootDSystem relies on the Python bindings of the XRootD client and requires a working XRootD client installation.

Hint

For Belle II users, the XRootD client is already installed in basf2 environments.

WebDAV#

In the background, the implementation for the WebDAVSystem relies on the webdavclient3 Python package and is included in the dependencies of b2luigi.

Hint

To access the remote storage you will need a valid VOMS proxy and need to know the location of the certificates enabling the communication to the remote server. For Belle II users, the certificates can be found under /cvmfs/belle.kek.jp/grid/setup/Belle-KEK/6.1.1/Linux-x86_64 in the folder /etc/grid-security/certificates (or other gbasf2 versions).

Hint

For the WebDAVSystem to work, you need to set X509_USER_PROXY and X509_CERT_DIR either as b2luigi setting or as environment variable.

X509_CERT_DIR points to the location of the certificates.

General Usage#

To use the targets, you will have to pass the RemoteTarget class and the keyword arguments to the add_to_output function or set the default_task_target_class and target_class_kwargs settings as demonstrated below. This requires some additional setup compared to the standard LocalTarget.

First you need to create a FileSystem object, which is used to connect to the remote server. For that you need to provide the server address like so: root://<server_address>:<port> in case of XRootD or https://<server_address>:<port> for WebDAV (davs is currently not supported).

Optionally, one can also set a scratch_dir which will be used to store temporary files when using the temporary_path context. (see the luigi documentation)

When entering this context, a temporary path will be created in the scratch_dir. At leaving the context, the file will then be copied to the final location on the remote storage.

A full task using RemoteTargets could look like this:

from b2luigi import XRootDSystem, RemoteTarget
from b2luigi.core.utils import create_output_filename
import b2luigi

class MyTask(b2luigi.Task):
    def run(self):
        file_name = "Hello_world.txt"
        target = self._get_output_target(file_name)
        with target.temporary_path() as temp_path:
            with open(temp_path, "w") as f:
                f.write("Hello World")

    def output(self):
        fs = XRootDSystem("root://eospublic.cern.ch")
        yield self.add_to_output("Hello_world.txt", RemoteTarget, file_system=fs)

Another example could be:

import b2luigi

class MyTask(b2luigi.Task):
    @property
    def result_dir(self):
        return "/path/on/server"

    @property
    def xrootdsystem(self):
        return b2luigi.XRootDSystem("root://eospublic.cern.ch")

    @property
    def default_task_target_class(self):
        return b2luigi.RemoteTarget

    @property
    def target_class_kwargs(self):
        return {
            "file_system": self.xrootdsystem,
        }

    def output(self):
        yield self.add_to_output("Hello_world.txt")

    @b2luigi.on_temporary_files
    def run(self):
        with open(self.get_output_file_name("Hello_world.txt"), "w") as f:
            f.write("Hello World")

It is also possible to mix different target classes in the same task. This could e.g. be useful if you want to create some control plots and store the larger data files on the remote storage.

import b2luigi

class MyTask(b2luigi.Task):

    def output(self):

        # adding a local file target
        yield self.add_to_output("local_file.txt")

        # adding a remote file target
        fs = b2luigi.XRootDSystem("root://eospublic.cern.ch")
        yield self.add_to_output(
            "remote_file.txt",
            b2luigi.RemoteTarget, file_system=fs
        )

    def run(self):
        # accessing the targets works as usual

        local_file_path = self.get_output_file_name("local_file.txt")

        remote_file_path = self.get_output_file_name("remote_file.txt")