.. _calibration data guide: Calibration data ================ This page contains examples of adding custom *calibration* (i.e. run/datetime dependant) data type. .. important:: By term "calibration data" (in NA64SW) we imply *any kind of data that depends on the event number* including not only the typical kinds of calibration information, but also such data as detector naming, geometry or positions. Most users will be interested only in first part of this page that provides some general API-based examples. "General usage" section is for advanced modification of the NA64SW API itself. Preface ------- NA64SW introduces a versatile yet complex subsystem for calibration data query. Albeit there are multiple API layers under the hood to support various scenarios, user's API tends to be reasonably simple. User's API is class based and anticipates a *subscription* mechanism. Class willing to have calibration data instance of current interest shall inherit a subscriber -- a *handle* (``calib::Handle<>``) template parameterised with certain data type. Then, whenever the calibration data of that type gets updated, the ``calib::Handle::handle_update( const T & )`` is called with new data. A data processing handler class is a typical example of the subscriber. It usually maintains a copy of calibration data of interest (or some kind of values inferred based on this data). Design Considerations ~~~~~~~~~~~~~~~~~~~~~ * We expect that full set of all the calibration data needed for the every kind of processing has not be kept in RAM entirely. From this assumption one may immediately derive a necessity of some kind of runtime "state" or "cache" where the currently loaded information is kept (i.e. the amount of calibration data being actually "loaded" from the "existing" superset). * This "state" has to be synchronized with the event number under consideration. Typical source provoking state change -- data source object representation that notifies "state" with the newly acquired event's ID. For the MC case this is the hooks that are called when new event arrives (i.e. ``G4UserEventAction`` subclass, for Geant4 API). * "State" maintains a subscription for entities interested in particular types of calibration information via "observer-notifier" (or "pub/sub") pattern. I.e. changes in state (induced by changing the event ID) is propagated among objects who are interested in certain pieces of information with subscription mechanism. Definitions ~~~~~~~~~~~ * *Calibration data type* is the type of data that must be defined for certain validity period. * *Validity period* is defined as the range between two events or astronomical date and time. * *Loader class* defines how the calibration data must be loaded for certain event within the *validity period* (for certain events range). * *Calibration data index class* decides whether the data must be loaded. For instance, it may consider the event number and match it with certain *validity period*. * *Calibration data handle* is a simple wrapper class that keeps an instance of data that must be updated. * *Calibration data dispatcher* is the instance of special ``calib::Dispatcher`` class that, upon certain conditions, based on event ID and collection of *indexes* may cause update of calibration data of certain types. Practically, this is done by "notifying" *calibration data handles*. * *Calibration data manager* (``calib::Manager``) instance object provides unified access to current calibration state. It subclasses the dispatcher and maintains collections of loaders and indexes to define which handle must be updated and when. Resulting system built on these classes is still quite generic and foresees some sophisticated usecases like keeping the *run indexes* at the database, overriding it with configuration file and involve a mixed loading scheme where the data may be retrieved from ASCII CSV file, local database, ROOT file or fetched with RESTful API. Scope of definitions -------------------- One have to make a distinction between calibration data type introduced for one (*local*) or for multiple endpoints (*global* or *shared*). This difference mostly affects how and where the data type will be declared. *Local* calibration data types is an important usecase for prototyping -- when one does not want to burden global namespace and registries with definitions that make no sense without certain handler. This is a good choice for highly specialized handlers that require extra data. This way one can keep all the datatype definitions within the same implementation (``.cc``) file where the handler itself is defined. Somewhat similar to C/C++ ``static`` definitions. If calibration functionality is shared across multiple handlers within certain extension it can be defined in the reentrant header. Finally, a widely used runtme-dependant information can be defined at the level of ``na64calib`` library (requires project rebuild). Introducing new data type ------------------------- One have to express calibration data piece as a C/C++ data. It is usually something like a POD structure. For instance, let's consider a typcial *placement* datum that describes spatial position, orientation, size etc. for a tracking detector. .. code-block:: cpp struct Placement { std::string name; ///< name of the plane float center[3] ///< global coordinates of the plane, cm , size[3] ///< size of the plane, cm , rot[3] ///< Euler angles of the plane defining its orientation ; int nWires; ///< number of wires float resolution; ///< detector resolution }; This way to make handler to subscribe to this kind of information a simplest way would be to subclass a ``calib::Handle<>``. Taking into account that update on that type information is usually provided as a set of such objects, it would be convenient to subscribe on the set itself instead of individual entry (let it be an ``std::list`` for instance). Subscribing to updates ---------------------- For instance, a handler using the resolution information may have a cache designed to keep topical info: .. code-block:: cpp typedef std::list Placements; class MyHandler : public AbstractHandler , public calib::Handle { protected: std::map _resolutionDict; protected: void handle_update( const Placements & updates ) { for( const auto & update : updates ) { _resolutionDict[update.name] = update.resolution; } } public: ProcRes process_event(Event & event) { // ... process event relying on `_resolutionDict` here return kOk; } MyHandler( calib::Manager & mgr ) : calib::Handle("default", mgr) {} }; Additional string provided to ``calib::Handle<>`` constructor is used together with type RTTI information to uniquely identify the *subclass* of the calibration information. It was introduced to make a distinction between same C/C++ type. For instance, one may like to have a spatical vector (say ``TVector3``) as calibration info instance, but one still have to address it with some additional semantics. Most of the time this additional string classifier can be left to ``"default"``. NA64SW designed in a way that calibration data update, whenever it is available, will be provided *before* event is processed (i.e. ``handle_update()`` will be called prior to ``process_event()`` once new event has arrived). Note on data types aliases -------------------------- Calibration data types are indexed by means of C++ RTTI (``std::type_info`` hash) + some string that sometimes called *calibration data type subclass*. This two things are expected to uniquely identify particular calibration data type. However there is an often requirement in API to refer to this ID by some human-readable string. For this purpose an aliasing technique is introduced by global dictionary performing alias-to-ID and vice vers conversions. Note on Dependencies -------------------- Calib data types often have dependencies. Most obvious common dependency is *detector naming* needed to convert stringified detector names to integer IDs. These dependencies may be either specified in ``calibrations.yaml`` in the ``dependencies:`` section (for global types) or provided in C++ code with following directive using ``CIDataAliases`` singleton (more suitable for module-local types): .. code-block:: cpp na64dp::calib::CIDataAliases::self().add_dependency("TargetType", "DependencyType"); This will make ``"DependencyType"`` alias subject type to be loaded before ``TargetType`` alias subject type. C++ dependency assignment is better to use with handler-local calibration data types, while YAML is better suited for non-local types. Choosing update source ---------------------- A matter of choice is how this update has to be provided. There are few architectural layers: 1. A foremost layer is *manager* instance (``calib::Manager`` that is provided to ``REGISTER_HANDLER()``). This class aggregates all the subscriptions and update dispatching routines. 2. A layer of standalone *indexes* and *loaders*. These objects are collected by *manager* and used to define when and what and have to be fetched and how. 3. An *SDC* module providing subsystem for local file discovery. This module has a core functions written for original ``p348reco`` data reconstruction routines, and has been naturally interfaced to NA64SW. It is up to you, as code designer, to decide which particular mechanism to explot. Yet, dealing with manager directly (i.e. copying instance, subclassing, etc) almost never a good practice since it involves infrastructural changes. Introducing new *index* or *loader* would require much less changes in general API and foreseen to various meachanisms of providing data validity and data delivery (e.g. a remote database, RESTful API, etc). We consider some examples below. Since most common usage scenario for calibration data is local to certain handler, we've tried to ease integration of custom data types within user's extension. Calibrations config as an update source ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For small pieces of data that are hardly to change for years, one can consider putting such an information right into a ``calibrations.yaml`` file. This is a main configuration file that is used to initialize the whole infrastructure. Within this file a ``standaloneDocuments`` YAML object is provided, listing documents or individual parameters that are barely will be changed for years, together with their validity intervals. Direct parameters +++++++++++++++++ Tiny pieces of runtime-dependant information (yet semi-constant), like name of the master counter in a setup can be listed right in this file. To utilize this document as it is provided, one may set a ``payload`` to YAML object needed and specify ``loader`` to be a ``yaml-doc``. .. todo:: Describe C/C++ routine here. See the ``masterTimeSource`` definitions for instance. Static files ++++++++++++ For low-volatile information that hardly to be updated or modified, but which is too big to put the whole thing into ``calibrations.yaml``, one may consider using a reference to static files. .. todo:: Elaborate, describe C/C++ routine here. See the ``MuMegaLayout`` for instance. SDC data as an update source ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SDC module is useful for high-volatile, relatively bulky information that can be expressed in CSV files -- a classic example of calibration data. Its indexes and metadata is provided together with CSV data blocks. The particular type is specified within the file itself as well. Despite SDC has customizable grammar, it is till possible to describe its basics: * blank lines are ignored * comments start with a single char, an octothorp (``#``) by default. Rest of the line after this char is ignored * A line containing a specail *metadata delimiter* is considered as one having a metadata (default is equal sign, ``=``). String before this marker is considered as metadata entry *key*, string after (and before newline) -- as *value* * An SDC *metadata* is a simple a key/value (str-to-str) storage * Two *key* of metadata have a special meaning: - one designated to specify validity period is called *metadata key tag* (``runs`` by default) - another that shall provide data type is called *metadata type tag* (``type`` by default) * A data itself is provided in a columnar format, similar to space-separated CSV. Any non-blank, non-metadata lines are considered as data lines * Data lines form a *data block* which is considered to have any previously defined metadata valid For instance: .. code-block:: cfg # This is a comment runs=123-315 type=MyType foo 123 3.4e12 bar 712 nan myMetadata=blah #blah runs=316-512 foo 321 1e-12 bar 314 512 Will define two data blocks, both of the same type but for different periods. Only the second block will have a metadata entry ``myMetadata`` defined (and set to ``"blah"``). Albeit SDC files discovered automatically, some C/C++ interfacing is still needed to tell NA64SW how CSV block has to be converted into particular C/C++ data type. Extending SDC with a data type ++++++++++++++++++++++++++++++ SDC subsystem traverses directory structure looking for files matching certain criteria (wildcard or size limits). Every file matching this criteria is expected to contain the SDC content -- metadata and CSV blocks. These files are *pre-parsed* once meaning that their *type tag* and *validity tag* metadata entries will be read and this file entry will be indexed by SDC classes. Pre-parsing is generic procedure and does not need much customization, except for grammar. Then, once update is initiated, corresponding file will be actually *parsed*, meaning that its metadata will be read and accumulated to a metadata cache and CSV block will be provided to C++ *traits* in order to turn tokens into a C/C++ structure. With these traits a parsing process gets customized. I.e. C++ traits defining how line tokens should be converted into C/C++ struct is most meaningful extension point to introduce new calibration type. Expected traits implementation is heavily based on the advanced C++ templates technique to facilitate extensibility of the SDC system. Example traits to load the ``Placement`` entry (snippet is a bit lengthy, but useful): .. code-block:: cpp // This traits must be defined in the "sdc" namespace namespace sdc { // Traits are C++ template specification of CalibDataTraits<> template // struct -- so this syntax is mandatory template<> struct CalibDataTraits { // Type name alias, used to refer to this type in configuration files, // mandatory constexpr static constexpr auto typeName = "placement"; /// A collection type of the parsed entries (usually STL container, but /// one can define any template parameterised with a single type) template using Collection=std::list; /// An action performed to put newly parsed data into collection template static inline void collect( Collection & col , const T & e , const aux::MetaInfo & , size_t ) { col.push_back(e); } static na64dp::calib::Placement parse_line( const std::string & line , size_t lineNo , const aux::MetaInfo & m , const std::string & filename ) { // create object and set everything to zero na64dp::calib::Placement obj; // Create and validate columns order, relying on "columns" metadata // entry auto csv = m.get("columns", lineNo) .interpret(aux::tokenize(line)); // Get columns obj.name = csv("name"); obj.center[0] = csv("x", std::nan("0")); obj.center[1] = csv("y", std::nan("0")); obj.center[2] = csv("z", std::nan("0")); // ... etc -- turn tokens into data structure } }; } Once update is triggered, SDC will expect ``parse_line()`` static method to interpret CSV string as a single entry of ``CalibDataTraits::collection`` container. Source information and up-to-date metadata state is provided to this method to assist user's data conversion procedure. Then SDC uses ``collect()`` static method to put the parsed object in a collection instance. Collection instance is then dispatched to the subscribers. Registering alias for SDC data ++++++++++++++++++++++++++++++ To register new SDC data type one has to: 1. Declare its *alias* 2. Impose conversion function to the SDC wrapper 3. Define calibration data dependencies, if any (see next paragraph) Useful snippet to do all these things: .. code-block:: cpp // define new alias na64dp::calib::CIDataAliases::self() .add_alias_of("placements", "default"); // get SDC loader from manager, downcast it to proper type (SDCWrapper) // and impose type conversion using previously defined traits mgr.get_loader("sdc") .enable_sdc_type_conversion(); // Require "naming" to be loaded before "placements" na64dp::calib::CIDataAliases::self().add_dependency("placements", "naming"); These block must be executed before any subscription of target type is done, i.e. putting it into handler constructor won't work. For local extensions, consider ``REGISTER_HANDLER`` body, for global framework-wide changes modify ``na64dp/calib/sdc-extensions.hh`` header. General use ----------- In the generic case, when not SDC, nor standard means are useful for user's calibration data, a following usage is foreseen: .. todo:: Revise/restructure notes below. Custom indexes and loaders ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Define the calibration data type. It may be a simple type, POD struct or complex class. Single instance of this type must contain only the data needed for certain *validity period*. 2. Use (existing or implement new) loader. In case of adding new loader one have to subclass the vanilla ``iCalibDataLoader`` (and add the appropriate entry within run index). In more common case one may rely on a generic loader and subclass ``GenericLoader::AbstractCalibDataIndex``. In case of simple data trivially constructible from YAML description consider usage of ``SimpleYAMLDataIndex`` that defines stateless, single-function conversions. For more complex or a bit more bulky data, consider subclassing from ``iCSVFilesDataIndex`` -- users have to only define a tuple-to-data-type conversion method in their subclass. 3. If you have subclassed the ``iCalibDataLoader`` -- instantiate your loader and add it to the application's manager with ``Manager::add_loader()`` method. If, on the contrary, you have subclassed the ``GenericLoader::AbstractCalibDataIndex`` instantiate and add the data index to ``GenericLoader`` instance mentioning new data field (see, e.g., pipeline function). Defining a runtime data loader constructor for YAML is somewhat tricky (a small intermediate structure is used) -- refer to ``calib::CalibInfoCtr``'s docs. Usage for external applications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The central object that user's app must maintain is an isntance of calibration data manger: ``na64dp::calib::Manager`` object. It might be instantiated with no additional preparations: .. code-block:: cpp na64dp::calib::Manager mgr; Having this object, user's classes may then subscribe to calibration update. E.g., having calibration data type, say ``MyCalibData`` and class ``MyCalibDataUser``, one may utilise the subscription by implementing the ``Observable::iObserver`` interface, and calling ``subscribe()`` method of the ``Manager`` instance: .. code-block:: cpp class MyCalibDataUser : public na64dp::util::Observable::iObserver { // ... MyCalibDataUser( na64dp::calib::Manager & mgr ) { mgr.subscribe(*this, "default"); } // ... protected: virtual void handle_update( const MyCalibData & ) override; // ... }; Here ``subscribe()`` takes a string argument (``"default"`` in example) that, together with C++ type info, identifies the type of calibration entry. .. note:: ``calib::Handle<>`` subclasses ``util::Observable::iObserver`` to provide a *cached* copy of data and some useful shortcuts for data retrieval. Note that the calibration data user class has only one way to recieve the updated data instance -- a protected method ``handle_update()``. This reference is guaranteed to be valid until next ``handle_update()`` call, so subscriber may safely cache it. To make the ``Manager`` isntance do something meaningful, one have to bind the *run index* instance. Range-overriding run index ~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``RangeOverrideRunIndex`` class implements a straightforward way of mapping the event identifier over a subsets of calibration sources. In this class we assume that certain calibration data object is valid *starting from* some event ID *till* the next event ID that has another calibration data object. This way a data signed with event ID `5` will override (or *substitute*) the entity signed with event ID `2`. It is the simplest yet useful mapping scheme to define range matching. Generic Loader ~~~~~~~~~~~~~~ Most of the practical usecases do not require a level of generalization that vanilla ``iCalibDataLoader`` offers (loading the data of any type for any event). The ``GenericLoader`` class implements a calibration data loader with dynamic composition. Once loading data of certain type is requested (by ``load_data()`` method), it forwards execution to one of its *data index* instances (subclassing ``GenericLoader::AbstractCalibDataIndex``) that performs actual loading. The "data index" instance is specific for certain data type while ``GenericLoader`` is not (so ``GenericLoader`` instance is just a collection of type-specific loaders) and that is the reason for this level of composition. New instances of data indexes are added by ``GenericLoader::add_data_index()`` at the startup. YAML Calibration Info +++++++++++++++++++++ The ``iYAMLDataIndex``, a template subclass of ``GenericLoader::add_data_index()`` represents data index where information needed to load calibration data for particular run is stored as YAML node (in RAM). It might be either a entire input neede to construct a new calibration data object (see ``SimpleYAMLDataIndex``), or just some description needed to load a relatevely large pice of data (``iCSVFilesDataIndex``).