.. _calibration data guide:

Calibration data
================

This page contains examples of adding custom *calibration* (i.e. run/datetime
dependant) data type.

.. important::

    By term "calibration data" (in NA64SW) we imply *any kind of data that
    depends on the event number* including not only the typical kinds of
    calibration information, but also such data as detector naming, geometry
    or positions.

Most users will be interested only in first part of this page that provides
some general API-based examples. "General usage" section is for advanced
modification of the NA64SW API itself.

Preface
-------

NA64SW introduces a versatile yet complex subsystem for calibration data
query. Albeit there are multiple API layers under the hood to support
various scenarios, user's API tends to be reasonably simple.

User's API is class based and anticipates a *subscription* mechanism. Class
willing to have calibration data instance of current interest shall inherit
a subscriber -- a *handle* (``calib::Handle<>``) template parameterised with
certain data type. Then, whenever the calibration data of that type gets
updated, the ``calib::Handle<T>::handle_update( const T & )`` is called with
new data.

A data processing handler class is a typical example of the subscriber. It
usually maintains a copy of calibration data of interest (or some kind of
values inferred based on this data).


Design Considerations
~~~~~~~~~~~~~~~~~~~~~

* We expect that full set of all the calibration data needed for the every kind
  of processing has not be kept in RAM entirely. From this assumption one may
  immediately derive a necessity of some kind of runtime "state" or "cache" where
  the currently loaded information is kept (i.e. the amount of calibration data
  being actually "loaded" from the "existing" superset).
* This "state" has to be synchronized with the event number under
  consideration. Typical source provoking state change -- data source
  object representation that notifies "state" with the newly acquired event's ID.
  For the MC case this is the hooks that are called when new event arrives (i.e.
  ``G4UserEventAction`` subclass, for Geant4 API).
* "State" maintains a subscription for entities interested in particular types
  of calibration information via "observer-notifier" (or "pub/sub") pattern. I.e.
  changes in state (induced by changing the event ID) is propagated among objects
  who are interested in certain pieces of information with subscription mechanism.

Definitions
~~~~~~~~~~~

* *Calibration data type* is the type of data that must be defined for certain
  validity period.
* *Validity period* is defined as the range between two events or astronomical
  date and time.
* *Loader class* defines how the calibration data must be loaded
  for certain event within the *validity period* (for certain events range).
* *Calibration data index class* decides whether the data must be loaded. For
  instance, it may consider the event number and match it with certain
  *validity period*.
* *Calibration data handle* is a simple wrapper class that keeps an instance of
  data that must be updated.
* *Calibration data dispatcher* is the instance of special ``calib::Dispatcher``
  class that, upon certain conditions, based on event ID and collection of
  *indexes* may cause update of calibration data of certain types. Practically,
  this is done by "notifying" *calibration data handles*.
* *Calibration data manager* (``calib::Manager``) instance object provides
  unified access to current calibration state. It subclasses the dispatcher
  and maintains collections of loaders and indexes to define which handle must
  be updated and when.

Resulting system built on these classes is still quite generic and foresees
some sophisticated usecases like keeping the *run indexes* at the database,
overriding it with configuration file and involve a mixed loading scheme where
the data may be retrieved from ASCII CSV file, local database, ROOT file or
fetched with RESTful API.

Scope of definitions
--------------------

One have to make a distinction between calibration data type introduced
for one (*local*) or for multiple endpoints (*global* or *shared*). This
difference mostly affects how and where the data type will be declared.

*Local* calibration data types is an important usecase for prototyping -- when
one does not want to burden global namespace and registries with definitions
that make no sense without certain handler. This is a good choice for
highly specialized handlers that require extra data. This way one can keep
all the datatype definitions within the same implementation (``.cc``) file
where the handler itself is defined. Somewhat similar to C/C++ ``static``
definitions.

If calibration functionality is shared across multiple handlers within certain
extension it can be defined in the reentrant header.

Finally, a widely used runtme-dependant information can be defined at the level
of ``na64calib`` library (requires project rebuild).

Introducing new data type
-------------------------

One have to express calibration data piece as a C/C++ data. It is usually
something like a POD structure. For instance, let's consider a typcial
*placement* datum that describes spatial position, orientation, size etc. for
a tracking detector.

.. code-block:: cpp

    struct Placement {
        std::string name;  ///< name of the plane
        float center[3]  ///< global coordinates of the plane, cm
            , size[3]  ///< size of the plane, cm
            , rot[3]  ///< Euler angles of the plane defining its orientation
            ;
        int nWires;  ///< number of wires
        float resolution;  ///< detector resolution
    };

This way to make handler to subscribe to this kind of information a simplest
way would be to subclass a ``calib::Handle<>``. Taking into account that update
on that type information is usually provided as a set of such objects, it would
be convenient to subscribe on the set itself instead of individual entry (let
it be an ``std::list`` for instance).

Subscribing to updates
----------------------

For instance, a handler using the resolution information may have a cache
designed to keep topical info:

.. code-block:: cpp

    typedef std::list<Placement> Placements;

    class MyHandler : public AbstractHandler
                    , public calib::Handle<Placements> {
    protected:
        std::map<std::string, float> _resolutionDict;
    protected:
        void handle_update( const Placements & updates ) {
            for( const auto & update : updates ) {
                _resolutionDict[update.name] = update.resolution;
            }
        }
    public:
        ProcRes process_event(Event & event) {
            // ... process event relying on `_resolutionDict` here
            return kOk;
        }

        MyHandler( calib::Manager & mgr )
                : calib::Handle<Placements>("default", mgr)
                {}
    };

Additional string provided to ``calib::Handle<>`` constructor is used together
with type RTTI information to uniquely identify the *subclass* of the
calibration information. It was introduced to make a distinction between
same C/C++ type. For instance, one may like to have a spatical vector (say
``TVector3``) as calibration info instance, but one still have to address it
with some additional semantics. Most of the time this additional string
classifier can be left to ``"default"``.

NA64SW designed in a way that calibration data update, whenever it is
available, will be provided *before* event is processed (i.e.
``handle_update()`` will be called prior to ``process_event()`` once new
event has arrived).

Note on data types aliases
--------------------------

Calibration data types are indexed by means of C++ RTTI (``std::type_info``
hash) + some string that sometimes called *calibration data type subclass*.
This two things are expected to uniquely identify particular calibration
data type. However there is an often requirement in API to refer to this
ID by some human-readable string. For this purpose an aliasing technique
is introduced by global dictionary performing alias-to-ID and vice vers
conversions.

Note on Dependencies
--------------------

Calib data types often have dependencies. Most obvious common dependency is
*detector naming* needed to convert stringified detector names to integer IDs.
These dependencies may be either specified in ``calibrations.yaml`` in the
``dependencies:`` section (for global types) or provided in C++ code with
following directive using ``CIDataAliases`` singleton (more suitable for
module-local types):

.. code-block:: cpp

    na64dp::calib::CIDataAliases::self().add_dependency("TargetType", "DependencyType");

This will make ``"DependencyType"`` alias subject type to be loaded before
``TargetType`` alias subject type.

C++ dependency assignment is better to use with handler-local calibration data
types, while YAML is better suited for non-local types.

Choosing update source
----------------------

A matter of choice is how this update has to be provided. There are few
architectural layers:

1. A foremost layer is *manager* instance (``calib::Manager`` that is provided
   to ``REGISTER_HANDLER()``). This class aggregates all the subscriptions
   and update dispatching routines.
2. A layer of standalone *indexes* and *loaders*. These objects are collected
   by *manager* and used to define when and what and have to be fetched and
   how.
3. An *SDC* module providing subsystem for local file discovery. This module
   has a core functions written for original ``p348reco`` data reconstruction
   routines, and has been naturally interfaced to NA64SW.

It is up to you, as code designer, to decide which particular mechanism to
explot. Yet, dealing with manager directly (i.e. copying instance, subclassing,
etc) almost never a good practice since it involves infrastructural changes.
Introducing new *index* or *loader* would require much less changes in general
API and foreseen to various meachanisms of providing data validity and data
delivery (e.g. a remote database, RESTful API, etc). We consider some examples
below.

Since most common usage scenario for calibration data is local to certain
handler, we've tried to ease integration of custom data types within user's
extension.

Calibrations config as an update source
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For small pieces of data that are hardly to change for years, one can consider
putting such an information right into a ``calibrations.yaml`` file. This is a
main configuration file that is used to initialize the whole infrastructure.

Within this file a ``standaloneDocuments`` YAML object is provided, listing
documents or individual parameters that are barely will be changed for years,
together with their validity intervals.

Direct parameters
+++++++++++++++++

Tiny pieces of runtime-dependant information (yet semi-constant), like name of
the master counter in a setup can be listed right in this file. To utilize
this document as it is provided, one may set a ``payload`` to YAML object
needed and specify ``loader`` to be a ``yaml-doc``.

.. todo::

    Describe C/C++ routine here.

See the ``masterTimeSource`` definitions for instance.

Static files
++++++++++++

For low-volatile information that hardly to be updated or modified, but which
is too big to put the whole thing into ``calibrations.yaml``, one may consider
using a reference to static files.

.. todo::

    Elaborate, describe C/C++ routine here.

See the ``MuMegaLayout`` for instance.

SDC data as an update source
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SDC module is useful for high-volatile, relatively bulky information that
can be expressed in CSV files -- a classic example of calibration data.

Its indexes and metadata is provided together with CSV data blocks. The
particular type is specified within the file itself as well.

Despite SDC has customizable grammar, it is till possible to describe its
basics:

* blank lines are ignored
* comments start with a single char, an octothorp (``#``) by default. Rest of
  the line after this char is ignored
* A line containing a specail *metadata delimiter* is considered as one having
  a metadata (default is equal sign, ``=``). String before this marker is
  considered as metadata entry *key*, string after (and before newline) --
  as *value*
* An SDC *metadata* is a simple a key/value (str-to-str) storage
* Two *key* of metadata have a special meaning:

  - one designated to specify validity period is called *metadata key tag*
    (``runs`` by default)
  - another that shall provide data type is called *metadata type tag*
    (``type`` by default)

* A data itself is provided in a columnar format, similar to space-separated
  CSV. Any non-blank, non-metadata lines are considered as data lines
* Data lines form a *data block* which is considered to have any previously
  defined metadata valid

For instance:

.. code-block:: cfg

    # This is a comment
    runs=123-315
    type=MyType
    foo     123     3.4e12
    bar     712     nan

    myMetadata=blah #blah
    runs=316-512
    foo     321     1e-12
    bar     314     512

Will define two data blocks, both of the same type but for different periods.
Only the second block will have a metadata entry ``myMetadata`` defined (and
set to ``"blah"``).

Albeit SDC files discovered automatically, some C/C++ interfacing is still
needed to tell NA64SW how CSV block has to be converted into particular C/C++
data type.

Extending SDC with a data type
++++++++++++++++++++++++++++++

SDC subsystem traverses directory structure looking for files matching certain
criteria (wildcard or size limits). Every file matching this criteria is
expected to contain the SDC content -- metadata and CSV blocks. These files
are *pre-parsed* once meaning that their *type tag* and *validity tag* metadata
entries will be read and this file entry will be indexed by SDC classes.
Pre-parsing is generic procedure and does not need much customization, except
for grammar.

Then, once update is initiated, corresponding file will be actually *parsed*,
meaning that its metadata will be read and accumulated to a metadata cache and
CSV block will be provided to C++ *traits* in order to turn
tokens into a C/C++ structure. With these traits a parsing process gets
customized.

I.e. C++ traits defining how line tokens should be converted into C/C++
struct is most meaningful extension point to introduce new calibration type.
Expected traits implementation is heavily based on the advanced C++ templates
technique to facilitate extensibility of the SDC system.

Example traits to load the ``Placement`` entry (snippet is a bit lengthy, but
useful):

.. code-block:: cpp

    // This traits must be defined in the "sdc" namespace
    namespace sdc {
    // Traits are C++ template specification of CalibDataTraits<> template
    // struct -- so this syntax is mandatory
    template<>
    struct CalibDataTraits<Placement> {
        // Type name alias, used to refer to this type in configuration files,
        // mandatory constexpr
        static constexpr auto typeName = "placement";
        /// A collection type of the parsed entries (usually STL container, but
        /// one can define any template parameterised with a single type)
        template<typename T=na64dp::calib::Placement>
            using Collection=std::list<T>;
        /// An action performed to put newly parsed data into collection
        template<typename T=na64dp::calib::Placement>
        static inline void collect( Collection<T> & col
                                  , const T & e
                                  , const aux::MetaInfo &
                                  , size_t
                                  ) { col.push_back(e); }
        static na64dp::calib::Placement
            parse_line( const std::string & line
                      , size_t lineNo
                      , const aux::MetaInfo & m
                      , const std::string & filename
                      ) {
            // create object and set everything to zero
            na64dp::calib::Placement obj;
            // Create and validate columns order, relying on "columns" metadata
            // entry
            auto csv = m.get<aux::ColumnsOrder>("columns", lineNo)
                    .interpret(aux::tokenize(line));
            // Get columns
            obj.name = csv("name");
            obj.center[0] = csv("x", std::nan("0"));
            obj.center[1] = csv("y", std::nan("0"));
            obj.center[2] = csv("z", std::nan("0"));
            // ... etc -- turn tokens into data structure
        }
    };
    }

Once update is triggered, SDC will expect ``parse_line()`` static method to
interpret CSV string as a single entry of ``CalibDataTraits<T>::collection<T>``
container. Source information and up-to-date metadata state is provided to this
method to assist user's data conversion procedure.

Then SDC uses ``collect()`` static method to put the parsed object in a
collection instance.

Collection instance is then dispatched to the subscribers.

Registering alias for SDC data
++++++++++++++++++++++++++++++

To register new SDC data type one has to:

1. Declare its *alias*
2. Impose conversion function to the SDC wrapper
3. Define calibration data dependencies, if any (see next paragraph)

Useful snippet to do all these things:

.. code-block:: cpp

    // define new alias
    na64dp::calib::CIDataAliases::self()
                .add_alias_of<Placements>("placements", "default");
    // get SDC loader from manager, downcast it to proper type (SDCWrapper)
    // and impose type conversion using previously defined traits
    mgr.get_loader<na64dp::calib::SDCWrapper>("sdc")
        .enable_sdc_type_conversion<p348reco::RunNo_t, Placement>();
    // Require "naming" to be loaded before "placements"
    na64dp::calib::CIDataAliases::self().add_dependency("placements", "naming");

These block must be executed before any subscription of target type is done,
i.e. putting it into handler constructor won't work. For local extensions,
consider ``REGISTER_HANDLER`` body, for global framework-wide changes modify
``na64dp/calib/sdc-extensions.hh`` header.

General use
-----------

In the generic case, when not SDC, nor standard means are useful for user's
calibration data, a following usage is foreseen:

.. todo::

    Revise/restructure notes below.

Custom indexes and loaders
~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Define the calibration data type. It may be a simple type, POD struct or
   complex class. Single instance of this type must contain only the data needed
   for certain *validity period*.
2. Use (existing or implement new) loader. In case of adding new loader
   one have to subclass the vanilla ``iCalibDataLoader`` (and add the appropriate
   entry within run index). In more common case one may rely on a generic loader
   and subclass ``GenericLoader::AbstractCalibDataIndex``. In case of simple
   data trivially constructible from YAML description consider usage of
   ``SimpleYAMLDataIndex<T>`` that defines stateless, single-function conversions.
   For more complex or a bit more bulky data, consider subclassing from
   ``iCSVFilesDataIndex<T, TupleTypesT ...>`` -- users have to only define a
   tuple-to-data-type conversion method in their subclass.
3. If you have subclassed the ``iCalibDataLoader`` -- instantiate your loader and
   add it to the application's  manager with ``Manager::add_loader()`` method. If,
   on the contrary, you have subclassed the ``GenericLoader::AbstractCalibDataIndex``
   instantiate and add the data index to ``GenericLoader`` instance
   mentioning new data field (see, e.g., pipeline function). Defining a runtime
   data loader constructor for YAML is somewhat tricky (a small
   intermediate structure is used) -- refer to ``calib::CalibInfoCtr``'s docs.

Usage for external applications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The central object that user's app must maintain is an isntance of calibration
data manger: ``na64dp::calib::Manager`` object. It might be instantiated with no
additional preparations:

.. code-block:: cpp

        na64dp::calib::Manager mgr;

Having this object, user's classes may then subscribe to calibration update.
E.g., having calibration data type, say ``MyCalibData`` and
class ``MyCalibDataUser``, one may utilise the subscription by implementing
the ``Observable<MyCalibData>::iObserver`` interface, and calling
``subscribe()`` method of the ``Manager`` instance:

.. code-block:: cpp

        class MyCalibDataUser : public na64dp::util::Observable<MyCalibData>::iObserver {
            // ...
            MyCalibDataUser( na64dp::calib::Manager & mgr ) {
                mgr.subscribe<MyCalibData>(*this, "default");
            }
            // ...
        protected:
            virtual void handle_update( const MyCalibData & ) override;
            // ...
        };

Here ``subscribe()`` takes a string argument (``"default"`` in example) that,
together with C++ type info, identifies the type of calibration entry.

.. note::

    ``calib::Handle<>`` subclasses ``util::Observable<T>::iObserver`` to
    provide a *cached* copy of data and some useful shortcuts for data
    retrieval.

Note that the calibration data user class has only one way to recieve the
updated data instance -- a protected method ``handle_update()``. This reference
is guaranteed to be valid until next ``handle_update()`` call, so
subscriber may safely cache it.

To make the ``Manager`` isntance do something meaningful, one have to bind the
*run index* instance.

Range-overriding run index
~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``RangeOverrideRunIndex`` class implements a straightforward way of
mapping the event identifier over a subsets of calibration sources. In this
class we assume that certain calibration data object is valid *starting from*
some event ID *till* the next event ID that has another calibration data
object. This way a data signed with event ID `5` will override
(or *substitute*) the entity signed with event ID `2`. It is the simplest yet
useful mapping scheme to define range matching.

Generic Loader
~~~~~~~~~~~~~~

Most of the practical usecases do not require a level of generalization that
vanilla ``iCalibDataLoader`` offers (loading the data of any type for any event).

The ``GenericLoader`` class implements a calibration data loader with dynamic
composition. Once loading data of certain type is requested (by ``load_data()``
method), it forwards execution to one of its *data index* instances
(subclassing ``GenericLoader::AbstractCalibDataIndex``) that performs actual
loading. The "data index" instance is specific for certain data type while
``GenericLoader`` is not (so ``GenericLoader`` instance is just a collection
of type-specific loaders) and that is the reason for this level of
composition.

New instances of data indexes are added by ``GenericLoader::add_data_index()``
at the startup.

YAML Calibration Info
+++++++++++++++++++++

The ``iYAMLDataIndex<T>``, a template subclass of
``GenericLoader::add_data_index()`` represents data index where information
needed to load calibration data for particular run is stored as YAML node (in
RAM). It might be either a entire input neede to construct a new calibration data
object (see ``SimpleYAMLDataIndex<T>``), or just some description needed to load
a relatevely large pice of data (``iCSVFilesDataIndex``).