Calibration data¶

This page contains examples of adding custom calibration (i.e. run/datetime dependant) data type.

Important

By term “calibration data” (in NA64SW) we imply any kind of data that depends on the event number including not only the typical kinds of calibration information, but also such data as detector naming, geometry or positions.

Most users will be interested only in first part of this page that provides some general API-based examples. “General usage” section is for advanced modification of the NA64SW API itself.

Preface¶

NA64SW introduces a versatile yet complex subsystem for calibration data query. Albeit there are multiple API layers under the hood to support various scenarios, user’s API tends to be reasonably simple.

User’s API is class based and anticipates a subscription mechanism. Class willing to have calibration data instance of current interest shall inherit a subscriber – a handle (calib::Handle<>) template parameterised with certain data type. Then, whenever the calibration data of that type gets updated, the calib::Handle<T>::handle_update( const T & ) is called with new data.

A data processing handler class is a typical example of the subscriber. It usually maintains a copy of calibration data of interest (or some kind of values inferred based on this data).

Design Considerations¶

We expect that full set of all the calibration data needed for the every kind of processing has not be kept in RAM entirely. From this assumption one may immediately derive a necessity of some kind of runtime “state” or “cache” where the currently loaded information is kept (i.e. the amount of calibration data being actually “loaded” from the “existing” superset).
This “state” has to be synchronized with the event number under consideration. Typical source provoking state change – data source object representation that notifies “state” with the newly acquired event’s ID. For the MC case this is the hooks that are called when new event arrives (i.e. G4UserEventAction subclass, for Geant4 API).
“State” maintains a subscription for entities interested in particular types of calibration information via “observer-notifier” (or “pub/sub”) pattern. I.e. changes in state (induced by changing the event ID) is propagated among objects who are interested in certain pieces of information with subscription mechanism.

Definitions¶

Calibration data type is the type of data that must be defined for certain validity period.
Validity period is defined as the range between two events or astronomical date and time.
Loader class defines how the calibration data must be loaded for certain event within the validity period (for certain events range).
Calibration data index class decides whether the data must be loaded. For instance, it may consider the event number and match it with certain validity period.
Calibration data handle is a simple wrapper class that keeps an instance of data that must be updated.
Calibration data dispatcher is the instance of special calib::Dispatcher class that, upon certain conditions, based on event ID and collection of indexes may cause update of calibration data of certain types. Practically, this is done by “notifying” calibration data handles.
Calibration data manager (calib::Manager) instance object provides unified access to current calibration state. It subclasses the dispatcher and maintains collections of loaders and indexes to define which handle must be updated and when.

Resulting system built on these classes is still quite generic and foresees some sophisticated usecases like keeping the run indexes at the database, overriding it with configuration file and involve a mixed loading scheme where the data may be retrieved from ASCII CSV file, local database, ROOT file or fetched with RESTful API.

Scope of definitions¶

One have to make a distinction between calibration data type introduced for one (local) or for multiple endpoints (global or shared). This difference mostly affects how and where the data type will be declared.

Local calibration data types is an important usecase for prototyping – when one does not want to burden global namespace and registries with definitions that make no sense without certain handler. This is a good choice for highly specialized handlers that require extra data. This way one can keep all the datatype definitions within the same implementation (.cc) file where the handler itself is defined. Somewhat similar to C/C++ static definitions.

If calibration functionality is shared across multiple handlers within certain extension it can be defined in the reentrant header.

Finally, a widely used runtme-dependant information can be defined at the level of na64calib library (requires project rebuild).

Introducing new data type¶

One have to express calibration data piece as a C/C++ data. It is usually something like a POD structure. For instance, let’s consider a typcial placement datum that describes spatial position, orientation, size etc. for a tracking detector.

struct Placement {
    std::string name;  ///< name of the plane
    float center[3]  ///< global coordinates of the plane, cm
        , size[3]  ///< size of the plane, cm
        , rot[3]  ///< Euler angles of the plane defining its orientation
        ;
    int nWires;  ///< number of wires
    float resolution;  ///< detector resolution
};

This way to make handler to subscribe to this kind of information a simplest way would be to subclass a calib::Handle<>. Taking into account that update on that type information is usually provided as a set of such objects, it would be convenient to subscribe on the set itself instead of individual entry (let it be an std::list for instance).

Subscribing to updates¶

For instance, a handler using the resolution information may have a cache designed to keep topical info:

typedef std::list<Placement> Placements;

class MyHandler : public AbstractHandler
                , public calib::Handle<Placements> {
protected:
    std::map<std::string, float> _resolutionDict;
protected:
    void handle_update( const Placements & updates ) {
        for( const auto & update : updates ) {
            _resolutionDict[update.name] = update.resolution;
        }
    }
public:
    ProcRes process_event(Event & event) {
        // ... process event relying on `_resolutionDict` here
        return kOk;
    }

    MyHandler( calib::Manager & mgr )
            : calib::Handle<Placements>("default", mgr)
            {}
};

Additional string provided to calib::Handle<> constructor is used together with type RTTI information to uniquely identify the subclass of the calibration information. It was introduced to make a distinction between same C/C++ type. For instance, one may like to have a spatical vector (say TVector3) as calibration info instance, but one still have to address it with some additional semantics. Most of the time this additional string classifier can be left to "default".

NA64SW designed in a way that calibration data update, whenever it is available, will be provided before event is processed (i.e. handle_update() will be called prior to process_event() once new event has arrived).

Note on data types aliases¶

Calibration data types are indexed by means of C++ RTTI (std::type_info hash) + some string that sometimes called calibration data type subclass. This two things are expected to uniquely identify particular calibration data type. However there is an often requirement in API to refer to this ID by some human-readable string. For this purpose an aliasing technique is introduced by global dictionary performing alias-to-ID and vice vers conversions.

Note on Dependencies¶

Calib data types often have dependencies. Most obvious common dependency is detector naming needed to convert stringified detector names to integer IDs. These dependencies may be either specified in calibrations.yaml in the dependencies: section (for global types) or provided in C++ code with following directive using CIDataAliases singleton (more suitable for module-local types):

na64dp::calib::CIDataAliases::self().add_dependency("TargetType", "DependencyType");

This will make "DependencyType" alias subject type to be loaded before TargetType alias subject type.

C++ dependency assignment is better to use with handler-local calibration data types, while YAML is better suited for non-local types.

Choosing update source¶

A matter of choice is how this update has to be provided. There are few architectural layers:

A foremost layer is manager instance (calib::Manager that is provided to REGISTER_HANDLER()). This class aggregates all the subscriptions and update dispatching routines.
A layer of standalone indexes and loaders. These objects are collected by manager and used to define when and what and have to be fetched and how.
An SDC module providing subsystem for local file discovery. This module has a core functions written for original p348reco data reconstruction routines, and has been naturally interfaced to NA64SW.

It is up to you, as code designer, to decide which particular mechanism to explot. Yet, dealing with manager directly (i.e. copying instance, subclassing, etc) almost never a good practice since it involves infrastructural changes. Introducing new index or loader would require much less changes in general API and foreseen to various meachanisms of providing data validity and data delivery (e.g. a remote database, RESTful API, etc). We consider some examples below.

Since most common usage scenario for calibration data is local to certain handler, we’ve tried to ease integration of custom data types within user’s extension.

Calibrations config as an update source¶

For small pieces of data that are hardly to change for years, one can consider putting such an information right into a calibrations.yaml file. This is a main configuration file that is used to initialize the whole infrastructure.

Within this file a standaloneDocuments YAML object is provided, listing documents or individual parameters that are barely will be changed for years, together with their validity intervals.

Direct parameters¶

Tiny pieces of runtime-dependant information (yet semi-constant), like name of the master counter in a setup can be listed right in this file. To utilize this document as it is provided, one may set a payload to YAML object needed and specify loader to be a yaml-doc.

Todo

Describe C/C++ routine here.

See the masterTimeSource definitions for instance.

Static files¶

For low-volatile information that hardly to be updated or modified, but which is too big to put the whole thing into calibrations.yaml, one may consider using a reference to static files.

Todo

Elaborate, describe C/C++ routine here.

See the MuMegaLayout for instance.

SDC data as an update source¶

SDC module is useful for high-volatile, relatively bulky information that can be expressed in CSV files – a classic example of calibration data.

Its indexes and metadata is provided together with CSV data blocks. The particular type is specified within the file itself as well.

Despite SDC has customizable grammar, it is till possible to describe its basics:

blank lines are ignored
comments start with a single char, an octothorp (#) by default. Rest of the line after this char is ignored
A line containing a specail metadata delimiter is considered as one having a metadata (default is equal sign, =). String before this marker is considered as metadata entry key, string after (and before newline) – as value
An SDC metadata is a simple a key/value (str-to-str) storage
Two key of metadata have a special meaning:
- one designated to specify validity period is called metadata key tag (runs by default)
- another that shall provide data type is called metadata type tag (type by default)
A data itself is provided in a columnar format, similar to space-separated CSV. Any non-blank, non-metadata lines are considered as data lines
Data lines form a data block which is considered to have any previously defined metadata valid

For instance:

# This is a comment
runs=123-315
type=MyType
foo     123     3.4e12
bar     712     nan

myMetadata=blah #blah
runs=316-512
foo     321     1e-12
bar     314     512

Will define two data blocks, both of the same type but for different periods. Only the second block will have a metadata entry myMetadata defined (and set to "blah").

Albeit SDC files discovered automatically, some C/C++ interfacing is still needed to tell NA64SW how CSV block has to be converted into particular C/C++ data type.

Extending SDC with a data type¶

SDC subsystem traverses directory structure looking for files matching certain criteria (wildcard or size limits). Every file matching this criteria is expected to contain the SDC content – metadata and CSV blocks. These files are pre-parsed once meaning that their type tag and validity tag metadata entries will be read and this file entry will be indexed by SDC classes. Pre-parsing is generic procedure and does not need much customization, except for grammar.

Then, once update is initiated, corresponding file will be actually parsed, meaning that its metadata will be read and accumulated to a metadata cache and CSV block will be provided to C++ traits in order to turn tokens into a C/C++ structure. With these traits a parsing process gets customized.

I.e. C++ traits defining how line tokens should be converted into C/C++ struct is most meaningful extension point to introduce new calibration type. Expected traits implementation is heavily based on the advanced C++ templates technique to facilitate extensibility of the SDC system.

Example traits to load the Placement entry (snippet is a bit lengthy, but useful):

// This traits must be defined in the "sdc" namespace
namespace sdc {
// Traits are C++ template specification of CalibDataTraits<> template
// struct -- so this syntax is mandatory
template<>
struct CalibDataTraits<Placement> {
    // Type name alias, used to refer to this type in configuration files,
    // mandatory constexpr
    static constexpr auto typeName = "placement";
    /// A collection type of the parsed entries (usually STL container, but
    /// one can define any template parameterised with a single type)
    template<typename T=na64dp::calib::Placement>
        using Collection=std::list<T>;
    /// An action performed to put newly parsed data into collection
    template<typename T=na64dp::calib::Placement>
    static inline void collect( Collection<T> & col
                              , const T & e
                              , const aux::MetaInfo &
                              , size_t
                              ) { col.push_back(e); }
    static na64dp::calib::Placement
        parse_line( const std::string & line
                  , size_t lineNo
                  , const aux::MetaInfo & m
                  , const std::string & filename
                  ) {
        // create object and set everything to zero
        na64dp::calib::Placement obj;
        // Create and validate columns order, relying on "columns" metadata
        // entry
        auto csv = m.get<aux::ColumnsOrder>("columns", lineNo)
                .interpret(aux::tokenize(line));
        // Get columns
        obj.name = csv("name");
        obj.center[0] = csv("x", std::nan("0"));
        obj.center[1] = csv("y", std::nan("0"));
        obj.center[2] = csv("z", std::nan("0"));
        // ... etc -- turn tokens into data structure
    }
};
}

Once update is triggered, SDC will expect parse_line() static method to interpret CSV string as a single entry of CalibDataTraits<T>::collection<T> container. Source information and up-to-date metadata state is provided to this method to assist user’s data conversion procedure.

Then SDC uses collect() static method to put the parsed object in a collection instance.

Collection instance is then dispatched to the subscribers.

Registering alias for SDC data¶

To register new SDC data type one has to:

Declare its alias
Impose conversion function to the SDC wrapper
Define calibration data dependencies, if any (see next paragraph)

Useful snippet to do all these things:

// define new alias
na64dp::calib::CIDataAliases::self()
            .add_alias_of<Placements>("placements", "default");
// get SDC loader from manager, downcast it to proper type (SDCWrapper)
// and impose type conversion using previously defined traits
mgr.get_loader<na64dp::calib::SDCWrapper>("sdc")
    .enable_sdc_type_conversion<p348reco::RunNo_t, Placement>();
// Require "naming" to be loaded before "placements"
na64dp::calib::CIDataAliases::self().add_dependency("placements", "naming");

These block must be executed before any subscription of target type is done, i.e. putting it into handler constructor won’t work. For local extensions, consider REGISTER_HANDLER body, for global framework-wide changes modify na64dp/calib/sdc-extensions.hh header.

General use¶

In the generic case, when not SDC, nor standard means are useful for user’s calibration data, a following usage is foreseen:

Todo

Revise/restructure notes below.

Custom indexes and loaders¶

Define the calibration data type. It may be a simple type, POD struct or complex class. Single instance of this type must contain only the data needed for certain validity period.
Use (existing or implement new) loader. In case of adding new loader one have to subclass the vanilla iCalibDataLoader (and add the appropriate entry within run index). In more common case one may rely on a generic loader and subclass GenericLoader::AbstractCalibDataIndex. In case of simple data trivially constructible from YAML description consider usage of SimpleYAMLDataIndex<T> that defines stateless, single-function conversions. For more complex or a bit more bulky data, consider subclassing from iCSVFilesDataIndex<T, TupleTypesT ...> – users have to only define a tuple-to-data-type conversion method in their subclass.
If you have subclassed the iCalibDataLoader – instantiate your loader and add it to the application’s manager with Manager::add_loader() method. If, on the contrary, you have subclassed the GenericLoader::AbstractCalibDataIndex instantiate and add the data index to GenericLoader instance mentioning new data field (see, e.g., pipeline function). Defining a runtime data loader constructor for YAML is somewhat tricky (a small intermediate structure is used) – refer to calib::CalibInfoCtr’s docs.

Usage for external applications¶

The central object that user’s app must maintain is an isntance of calibration data manger: na64dp::calib::Manager object. It might be instantiated with no additional preparations:

na64dp::calib::Manager mgr;

Having this object, user’s classes may then subscribe to calibration update. E.g., having calibration data type, say MyCalibData and class MyCalibDataUser, one may utilise the subscription by implementing the Observable<MyCalibData>::iObserver interface, and calling subscribe() method of the Manager instance:

class MyCalibDataUser : public na64dp::util::Observable<MyCalibData>::iObserver {
    // ...
    MyCalibDataUser( na64dp::calib::Manager & mgr ) {
        mgr.subscribe<MyCalibData>(*this, "default");
    }
    // ...
protected:
    virtual void handle_update( const MyCalibData & ) override;
    // ...
};

Here subscribe() takes a string argument ("default" in example) that, together with C++ type info, identifies the type of calibration entry.

Note

calib::Handle<> subclasses util::Observable<T>::iObserver to provide a cached copy of data and some useful shortcuts for data retrieval.

Note that the calibration data user class has only one way to recieve the updated data instance – a protected method handle_update(). This reference is guaranteed to be valid until next handle_update() call, so subscriber may safely cache it.

To make the Manager isntance do something meaningful, one have to bind the run index instance.

Range-overriding run index¶

The RangeOverrideRunIndex class implements a straightforward way of mapping the event identifier over a subsets of calibration sources. In this class we assume that certain calibration data object is valid starting from some event ID till the next event ID that has another calibration data object. This way a data signed with event ID 5 will override (or substitute) the entity signed with event ID 2. It is the simplest yet useful mapping scheme to define range matching.

Generic Loader¶

Most of the practical usecases do not require a level of generalization that vanilla iCalibDataLoader offers (loading the data of any type for any event).

The GenericLoader class implements a calibration data loader with dynamic composition. Once loading data of certain type is requested (by load_data() method), it forwards execution to one of its data index instances (subclassing GenericLoader::AbstractCalibDataIndex) that performs actual loading. The “data index” instance is specific for certain data type while GenericLoader is not (so GenericLoader instance is just a collection of type-specific loaders) and that is the reason for this level of composition.

New instances of data indexes are added by GenericLoader::add_data_index() at the startup.

YAML Calibration Info¶

The iYAMLDataIndex<T>, a template subclass of GenericLoader::add_data_index() represents data index where information needed to load calibration data for particular run is stored as YAML node (in RAM). It might be either a entire input neede to construct a new calibration data object (see SimpleYAMLDataIndex<T>), or just some description needed to load a relatevely large pice of data (iCSVFilesDataIndex).