.. _running on cerns htcondor: Tip: Running on CERN's HTCondor =============================== `CERN HTCondor`_ provides infrastructure to facilitate complex computations using distributing parallel multiprocessing. This page describes procedure of running NA64sw as a batch job on HTCondor. .. _`CERN HTCondor`: https://batchdocs.web.cern.ch/ Generally, steering parallel jobs on a batch system with distributed nodes is a complicated task itself because of the variety of possible scenarios one may imagine for multi-staged processing tasks. Such a task usually performed by some work-management systems (WMS) like Pegasus_, Panda, etc. .. _Pegasus: https://pegasus.isi.edu/ .. _Panda: https://panda-wms.readthedocs.io/en/latest/ However, so far NA64 needs do not imply frequent usage of multi-staged scenarios. On this page we will provide just a simple recipe for running few pipeline processes that would be enough to perform NA64sw pipeline analysis on few chunks or few runs (a typical common need). Set-up ------ It is possible to run jobs without any WMS by just using CERN IT account and public build of NA64sw. Let's consider an alignment task: it typically requires quite some tracks being collected to provide representative picture on certain detectors being not well illuminated, thus requiring at least few chunks to be processed. Running pipeline application locally would be tedious. The following workflow is assumed roughly: 1. We start with certain ``placements.txt`` and pipeline settings with ``run.yaml`` file and list of files we would like to process as an input. We would like to tag jobs being ran to keep all the things related to this task in a certain place. 2. We create a job submission with *submission script* that tells HTCondor what and how to run. 3. Jobs run and generate output asynchroneously. At some point they will finish and we can harvest output from the *output dir*. Alignment task is just an example, it shall help you to grasp the idea. Workspace dir ~~~~~~~~~~~~~ To start, let's consider a directory that we further will refer as *workspace dir* (not be messed up with AFS Workspace dir!). A *workspace dir* for job submission will serve as a place to store submission files, job logs and other service information. It must be created on some of the CERN's filesystem share, like ``/afs`` because all the started remote jobs shall access this share. One can safely use home or *AFS wrokspace* dir for that purpose: .. code-block:: shell $ cd ~ $ mkdir alignment_workspace $ cd alignment_workspace In this brand new directory a job-submission script will reside. We will also put here job's logs and documents that we're going to change across the submitted tasks. .. warning:: Currently CERN infrastructure *does not* support submission from ``/eos`` shares, so we're restricted by ``/afs`` only. In the alignment task we're going change the ``placements.txt`` file that contains geometrical information of the setup. This is done by overriding the calibration information with custom ``placements.txt`` file and custom calibrations config (``-c calibrations.yaml`` option of the app). Copy the files to *workspace dir*: .. code-block:: shell $ cp $NA64SW_PREFIX/share/na64sw/calibrations.yaml . $ cp $NA64SW_PREFIX/var/src/na64sw/presets/calibrations/override/2022/placements.txt.bck placements.txt .. note:: We're using ``$NA64SW_PREFIX`` environment variable here, assuming that you ``source``'d ``.../this-env.sh`` script from one of the public builds. We will customize the ``calibrations.yaml`` a bit later. Output dir ~~~~~~~~~~ Choose some directory for ouput data. Typically we have to store the ``processed.root`` file. For the alignment task we'll need also to save the ``alignment.dat`` file produced by ``CollectAlignmentData`` handler. It better to keep this data on ``/eos`` as this share is large enough, optimized to keep medium and large sized files and pretty easy to access (via CERNBox, for instance). We will further refer to this dir as *output dir*. Submission script ----------------- We will use Bourne shell here, but this is not mandatory, of course. Consider following script as a useful snippet that has to be customized in order to cope with your *workspace* and *output* dirs. It is assumed that the submission script is ran from *workspace dir* -- create a file called ``submit.sh`` in your *workspace dir* and copy/paste this content: .. code-block:: bash #!/bin/bash # Batch submission script for NA64sw for alignment procedure. # # Will run `na64sw-pipe' app on HTCondor on every file provided in `in.txt' # using given runtime config. Expects one command line argument, a tag that # shall uniquely identify this batch procedure. TAG=$1 PLACEMENTS=$2 # Variables to customize # - path to NA64sw build (this-env.sh script must be available in this dir) NA64SWPRFX=/afs/cern.ch/work/r/rdusaev/public/na64/sw/LCG_101/x86_64-centos7-gcc11-opt # - runtime config to use RCFG=$(readlink -f run.yaml) # - list of input files; must contain one full path per line FILESLIST=$(readlink -f ./in.txt) # - where to put results (processed.root + alignment.dat) OUTDIR=/eos/user/r/rdusaev/autosync/na64/alignment.out/$TAG # - where tu put the logs and submission file SUBMITTED_DIR=./submitted/$TAG mkdir -p $OUTDIR mkdir -p $SUBMITTED_DIR cp $PLACEMENTS $OUTDIR cp calibrations.yaml $OUTDIR cp $RCFG $OUTDIR/run.yaml NJOBS=$(cat in.txt | wc -l) cat <<-EOF > $SUBMITTED_DIR/job.sub universe = vanilla should_transfer_files = NO transfer_output_files = max_transfer_output_mb = 2048 +JobFlavour = microcentury output = $SUBMITTED_DIR/\$(Process).out.txt error = $SUBMITTED_DIR/\$(Process).err.txt executable = `readlink -f run-alignment-single.sh` userLog = $SUBMITTED_DIR/htcondor-log.txt environment = "HTCONDOR_JOBINDEX=\$(Process)" arguments = $NA64SWPRFX $FILESLIST $OUTDIR queue $NJOBS EOF condor_submit -batch-name na64al-$TAG $SUBMITTED_DIR/job.sub The script expects two argumnts to be provided from command line invocation: a *tag* that will uniquelly identify task to run and a path to ``placements.txt`` file in use. The *tag* is assumed to be a meaningful human-readable string without spaces that is then used to create directories specific for the task. For instance, if I align MM03 in 2022-mu run, I will do it iteratively, and it would be convenient to have output ordered as ``022mu-MM03-01``, ``022mu-MM03-02``, ``022mu-MM03-03``, etc. When I switch to, say ``ST01``, I would like to have directories tagged as ``022mu-ST01-01``, ``022mu-ST01-02``, etc. As an input it uses the *in.txt* file containing absolute paths to experiment's data -- one path per line. Such a file can be produced with, e.g.: .. code-block:: shell $ ls /eos/experiment/na64/data/cdr/cdr01*-006030.dat > in.txt Choose run number you would like to operate or concatenate few runs within a file with ``>>``. The script will create sub-directories named as *tag* in your *output dir* and *workspace dir*. The point of splitting up the whole thing onto two directories is that you generally not interested in whatever is produced in *workspace dir* -- this is nasty service information that one would like to delete asap, while *output dir* is for our glorious analysis which we would like to keep forever and bequeath to posterity. Job script and job environment ------------------------------ Last part of the script (heredoc_) generates a file of a weird syntax, specific for HTCondor system which is called ClassAd_. This is a job submission instructions that defines things like: * What executable or script to be ran in parallel (``executable = ...``) * What are the arguments for this executable or script (``arguments = ...``) There are more important parameters in ClassAd related to the job configurations, some of them may be much more important that appears at a first glance (like ``+JobFlavour``). .. _heredoc: https://tldp.org/LDP/abs/html/here-docs.html .. _ClassAd: https://htcondor.readthedocs.io/en/latest/classads/classad-mechanism.html We set the ``executable = `readlink -f run-alignment-single.sh```. This is another ("*job*") script that actually will run on the HTCondor node. It's purpose is to set up the environment, take one line from the ``in.txt``, forward execution to pipeline executable and move output files to the *output dir*. The point is that when HTCondor runs a job the running process will operate in a quite restrictive environment. Node provides limited amount of CPU, RAM and disk space, limiting even the network transfer and bandwith. On the other hand, ``na64sw-pipe`` is a generic-purpose application configured primarily from the command line. To do so we would rather want it to run within some shell environment. Examplar content of ``run-alignment-single.sh``: .. code-block:: bash #!/bin/bash # This protects from running the script locally if [ -z ${HTCONDOR_JOBINDEX+x} ] ; then echo "Empty HTCONDOR_JOBINDEX variable." exit 2 fi # 1st arg must be NA64sw-prefix, second is the files list, third is the # destination dir. These usually provided by submission file. NA64SWPRFX=$1 FILESLIST=$2 OUTDIR=$3 # Setup the env source $NA64SWPRFX/this-env.sh echo "Using environment from $NA64SW_PREFIX" cp $OUTDIR/placements.txt . cp $OUTDIR/calibrations.yaml . cp $OUTDIR/run.yaml . # Get the filename to operate with NLINE=$((HTCONDOR_JOBINDEX+1)) SUBJFILE=$(sed "${NLINE}q;d" $FILESLIST) echo "Job #${NLINE}, file ${SUBJFILE}" # Run pipeline app $NA64SWPRFX/bin/na64sw-pipe -r run.yaml -c calibrations.yaml \ -m genfit-handlers -EGenFit2EvDisplay.enable=no \ -EROOTSetupGeometry.magnetsCfg=$NA64SWPRFX/var/src/na64sw/extensions/handlers-genfit/presets/magnets-2022.yaml \ -N 15000 \ --event-buffer-size-kb 4096 \ $SUBJFILE # Delete the (copies) of input files to not clutter the submission dir # (otherwise, HTCondor will copy 'em back for every job). rm -f placements.txt calibrations.yaml run.yaml # If there is `alignment.dat' file available, copy it if [ -f alignment.dat ] ; then gzip alignment.dat cp alignment.dat.gz $OUTDIR/alignment-$((HTCONDOR_JOBINDEX+1)).dat.gz fi # Copy `processed.root' output to dest dir cp processed.root $OUTDIR/processed-$((HTCONDOR_JOBINDEX+1)).root rm -f processed.root alignment.dat echo Done. So the script does what we've mentioned previously and also performs some householding for the demonstration: the ``alignment.dat`` output files will be compressed with ``gzip``. Note that it also copies ``placements.txt``, ``calibrations.yaml`` and ``run.yaml`` to some cwd (``.``) -- this is a job-execution dir, local to the node, where the process will actually keep all the stuff. One can also customize number of events being read and other invocation parameters within this script. Conclusion ---------- So, detailed workflow is: * Having the following input: - A pipeline run-config (``run.yaml`` kept locally in *workspace dir*) - A geometry file (``placements.txt``) - A list of chunks to process (``in.txt``) - A *tag* for the set of jobs being submitted * One call ``submit.sh`` from *workspace dir* providing *tag* and path to the ``placements.txt`` file * The ``submit.sh`` creates tagged dirs (service in *workspace dir* and for expected output in the *output dir*), generates ClassAd file and submits it to HTCondor for parallel processing: .. code-block:: shell $ ./submit.sh 2022mu-MM03-01 some/where/my-placements.txt * Jobs than will start asynchroneously and do whatever is written in ``run-alignment-single.sh`` (copy files, run the pipeline, copy output, cleanup, etc). You can inspect what is running with ``condor_q`` command on lxplus node. * Once all the jobs are done you will end up with bunch of ``processed-.root`` and ``alignment.dat.gz`` files in the tagged ouput dir. Typically, it takes 10-20 min to perform all the processing providing you with great amount of data, incomparable with local running. To merge ``processed.root`` files into a single one with summed histograms, consider ``hadd`` utility from ROOT. Reading of ``gzip``'ped list of files is supported by Python scripts of NA64sw.