Tip: Running on CERN’s HTCondor

CERN HTCondor provides infrastructure to facilitate complex computations using distributing parallel multiprocessing. This page describes procedure of running NA64sw as a batch job on HTCondor.

Generally, steering parallel jobs on a batch system with distributed nodes is a complicated task itself because of the variety of possible scenarios one may imagine for multi-staged processing tasks. Such a task usually performed by some work-management systems (WMS) like Pegasus, Panda, etc.

However, so far NA64 needs do not imply frequent usage of multi-staged scenarios. On this page we will provide just a simple recipe for running few pipeline processes that would be enough to perform NA64sw pipeline analysis on few chunks or few runs (a typical common need).

Set-up

It is possible to run jobs without any WMS by just using CERN IT account and public build of NA64sw. Let’s consider an alignment task: it typically requires quite some tracks being collected to provide representative picture on certain detectors being not well illuminated, thus requiring at least few chunks to be processed. Running pipeline application locally would be tedious.

The following workflow is assumed roughly:

  1. We start with certain placements.txt and pipeline settings with run.yaml file and list of files we would like to process as an input. We would like to tag jobs being ran to keep all the things related to this task in a certain place.

  2. We create a job submission with submission script that tells HTCondor what and how to run.

  3. Jobs run and generate output asynchroneously. At some point they will finish and we can harvest output from the output dir.

Alignment task is just an example, it shall help you to grasp the idea.

Workspace dir

To start, let’s consider a directory that we further will refer as workspace dir (not be messed up with AFS Workspace dir!). A workspace dir for job submission will serve as a place to store submission files, job logs and other service information. It must be created on some of the CERN’s filesystem share, like /afs because all the started remote jobs shall access this share. One can safely use home or AFS wrokspace dir for that purpose:

$ cd ~
$ mkdir alignment_workspace
$ cd alignment_workspace

In this brand new directory a job-submission script will reside. We will also put here job’s logs and documents that we’re going to change across the submitted tasks.

Warning

Currently CERN infrastructure does not support submission from /eos shares, so we’re restricted by /afs only.

In the alignment task we’re going change the placements.txt file that contains geometrical information of the setup. This is done by overriding the calibration information with custom placements.txt file and custom calibrations config (-c calibrations.yaml option of the app). Copy the files to workspace dir:

$ cp $NA64SW_PREFIX/share/na64sw/calibrations.yaml .
$ cp $NA64SW_PREFIX/var/src/na64sw/presets/calibrations/override/2022/placements.txt.bck placements.txt

Note

We’re using $NA64SW_PREFIX environment variable here, assuming that you source’d .../this-env.sh script from one of the public builds.

We will customize the calibrations.yaml a bit later.

Output dir

Choose some directory for ouput data. Typically we have to store the processed.root file. For the alignment task we’ll need also to save the alignment.dat file produced by CollectAlignmentData handler. It better to keep this data on /eos as this share is large enough, optimized to keep medium and large sized files and pretty easy to access (via CERNBox, for instance). We will further refer to this dir as output dir.

Submission script

We will use Bourne shell here, but this is not mandatory, of course. Consider following script as a useful snippet that has to be customized in order to cope with your workspace and output dirs. It is assumed that the submission script is ran from workspace dir – create a file called submit.sh in your workspace dir and copy/paste this content:

#!/bin/bash

# Batch submission script for NA64sw for alignment procedure.
#
# Will run `na64sw-pipe' app on HTCondor on every file provided in `in.txt'
# using given runtime config. Expects one command line argument, a tag that
# shall uniquely identify this batch procedure.

TAG=$1
PLACEMENTS=$2

# Variables to customize
# - path to NA64sw build (this-env.sh script must be available in this dir)
NA64SWPRFX=/afs/cern.ch/work/r/rdusaev/public/na64/sw/LCG_101/x86_64-centos7-gcc11-opt
# - runtime config to use
RCFG=$(readlink -f run.yaml)
# - list of input files; must contain one full path per line
FILESLIST=$(readlink -f ./in.txt)
# - where to put results (processed.root + alignment.dat)
OUTDIR=/eos/user/r/rdusaev/autosync/na64/alignment.out/$TAG
# - where tu put the logs and submission file
SUBMITTED_DIR=./submitted/$TAG

mkdir -p $OUTDIR
mkdir -p $SUBMITTED_DIR

cp $PLACEMENTS $OUTDIR
cp calibrations.yaml $OUTDIR
cp $RCFG $OUTDIR/run.yaml

NJOBS=$(cat in.txt | wc -l)

cat <<-EOF > $SUBMITTED_DIR/job.sub
    universe = vanilla
    should_transfer_files = NO
    transfer_output_files =
    max_transfer_output_mb = 2048
    +JobFlavour = microcentury
    output = $SUBMITTED_DIR/\$(Process).out.txt
    error = $SUBMITTED_DIR/\$(Process).err.txt
    executable = `readlink -f run-alignment-single.sh`
    userLog = $SUBMITTED_DIR/htcondor-log.txt
    environment = "HTCONDOR_JOBINDEX=\$(Process)"
    arguments = $NA64SWPRFX $FILESLIST $OUTDIR
    queue $NJOBS
    EOF

condor_submit -batch-name na64al-$TAG $SUBMITTED_DIR/job.sub

The script expects two argumnts to be provided from command line invocation: a tag that will uniquelly identify task to run and a path to placements.txt file in use. The tag is assumed to be a meaningful human-readable string without spaces that is then used to create directories specific for the task. For instance, if I align MM03 in 2022-mu run, I will do it iteratively, and it would be convenient to have output ordered as 022mu-MM03-01, 022mu-MM03-02, 022mu-MM03-03, etc. When I switch to, say ST01, I would like to have directories tagged as 022mu-ST01-01, 022mu-ST01-02, etc.

As an input it uses the in.txt file containing absolute paths to experiment’s data – one path per line. Such a file can be produced with, e.g.:

$ ls /eos/experiment/na64/data/cdr/cdr01*-006030.dat > in.txt

Choose run number you would like to operate or concatenate few runs within a file with >>.

The script will create sub-directories named as tag in your output dir and workspace dir. The point of splitting up the whole thing onto two directories is that you generally not interested in whatever is produced in workspace dir – this is nasty service information that one would like to delete asap, while output dir is for our glorious analysis which we would like to keep forever and bequeath to posterity.

Job script and job environment

Last part of the script (heredoc) generates a file of a weird syntax, specific for HTCondor system which is called ClassAd. This is a job submission instructions that defines things like:

  • What executable or script to be ran in parallel (executable = ...)

  • What are the arguments for this executable or script (arguments = ...)

There are more important parameters in ClassAd related to the job configurations, some of them may be much more important that appears at a first glance (like +JobFlavour).

We set the executable = `readlink -f run-alignment-single.sh`. This is another (“job”) script that actually will run on the HTCondor node. It’s purpose is to set up the environment, take one line from the in.txt, forward execution to pipeline executable and move output files to the output dir.

The point is that when HTCondor runs a job the running process will operate in a quite restrictive environment. Node provides limited amount of CPU, RAM and disk space, limiting even the network transfer and bandwith. On the other hand, na64sw-pipe is a generic-purpose application configured primarily from the command line. To do so we would rather want it to run within some shell environment.

Examplar content of run-alignment-single.sh:

#!/bin/bash

# This protects from running the script locally
if [ -z ${HTCONDOR_JOBINDEX+x} ] ; then
    echo "Empty HTCONDOR_JOBINDEX variable."
    exit 2
fi

# 1st arg must be NA64sw-prefix, second is the files list, third is the
# destination dir. These usually provided by submission file.
NA64SWPRFX=$1
FILESLIST=$2
OUTDIR=$3

# Setup the env
source $NA64SWPRFX/this-env.sh
echo "Using environment from $NA64SW_PREFIX"

cp $OUTDIR/placements.txt .
cp $OUTDIR/calibrations.yaml .
cp $OUTDIR/run.yaml .

# Get the filename to operate with
NLINE=$((HTCONDOR_JOBINDEX+1))
SUBJFILE=$(sed "${NLINE}q;d" $FILESLIST)
echo "Job #${NLINE}, file ${SUBJFILE}"

# Run pipeline app
$NA64SWPRFX/bin/na64sw-pipe -r run.yaml -c calibrations.yaml \
    -m genfit-handlers -EGenFit2EvDisplay.enable=no \
    -EROOTSetupGeometry.magnetsCfg=$NA64SWPRFX/var/src/na64sw/extensions/handlers-genfit/presets/magnets-2022.yaml \
    -N 15000 \
    --event-buffer-size-kb 4096 \
    $SUBJFILE

# Delete the (copies) of input files to not clutter the submission dir
# (otherwise, HTCondor will copy 'em back for every job).
rm -f placements.txt calibrations.yaml run.yaml

# If there is `alignment.dat' file available, copy it
if [ -f alignment.dat ] ; then
    gzip alignment.dat
    cp alignment.dat.gz $OUTDIR/alignment-$((HTCONDOR_JOBINDEX+1)).dat.gz
fi

# Copy `processed.root' output to dest dir
cp processed.root $OUTDIR/processed-$((HTCONDOR_JOBINDEX+1)).root
rm -f processed.root alignment.dat

echo Done.

So the script does what we’ve mentioned previously and also performs some householding for the demonstration: the alignment.dat output files will be compressed with gzip. Note that it also copies placements.txt, calibrations.yaml and run.yaml to some cwd (.) – this is a job-execution dir, local to the node, where the process will actually keep all the stuff.

One can also customize number of events being read and other invocation parameters within this script.

Conclusion

So, detailed workflow is:

  • Having the following input: - A pipeline run-config (run.yaml kept locally in workspace dir) - A geometry file (placements.txt) - A list of chunks to process (in.txt) - A tag for the set of jobs being submitted

  • One call submit.sh from workspace dir providing tag and path to the placements.txt file

  • The submit.sh creates tagged dirs (service in workspace dir and for expected output in the output dir), generates ClassAd file and submits it to HTCondor for parallel processing:

    $ ./submit.sh 2022mu-MM03-01 some/where/my-placements.txt
    
  • Jobs than will start asynchroneously and do whatever is written in run-alignment-single.sh (copy files, run the pipeline, copy output, cleanup, etc). You can inspect what is running with condor_q command on lxplus node.

  • Once all the jobs are done you will end up with bunch of processed-<number>.root and alignment.dat.gz files in the tagged ouput dir.

Typically, it takes 10-20 min to perform all the processing providing you with great amount of data, incomparable with local running. To merge processed.root files into a single one with summed histograms, consider hadd utility from ROOT. Reading of gzip’ped list of files is supported by Python scripts of NA64sw.