Calibration data¶
This page contains examples of adding custom calibration (i.e. run/datetime dependant) data type.
Important
By term “calibration data” (in NA64SW) we imply any kind of data that depends on the event number including not only the typical kinds of calibration information, but also such data as detector naming, geometry or positions.
Most users will be interested only in first part of this page that provides some general API-based examples. “General usage” section is for advanced modification of the NA64SW API itself.
Preface¶
NA64SW introduces a versatile yet complex subsystem for calibration data query. Albeit there are multiple API layers under the hood to support various scenarios, user’s API tends to be reasonably simple.
User’s API is class based and anticipates a subscription mechanism. Class
willing to have calibration data instance of current interest shall inherit
a subscriber – a handle (calib::Handle<>
) template parameterised with
certain data type. Then, whenever the calibration data of that type gets
updated, the calib::Handle<T>::handle_update( const T & )
is called with
new data.
A data processing handler class is a typical example of the subscriber. It usually maintains a copy of calibration data of interest (or some kind of values inferred based on this data).
Design Considerations¶
We expect that full set of all the calibration data needed for the every kind of processing has not be kept in RAM entirely. From this assumption one may immediately derive a necessity of some kind of runtime “state” or “cache” where the currently loaded information is kept (i.e. the amount of calibration data being actually “loaded” from the “existing” superset).
This “state” has to be synchronized with the event number under consideration. Typical source provoking state change – data source object representation that notifies “state” with the newly acquired event’s ID. For the MC case this is the hooks that are called when new event arrives (i.e.
G4UserEventAction
subclass, for Geant4 API).“State” maintains a subscription for entities interested in particular types of calibration information via “observer-notifier” (or “pub/sub”) pattern. I.e. changes in state (induced by changing the event ID) is propagated among objects who are interested in certain pieces of information with subscription mechanism.
Definitions¶
Calibration data type is the type of data that must be defined for certain validity period.
Validity period is defined as the range between two events or astronomical date and time.
Loader class defines how the calibration data must be loaded for certain event within the validity period (for certain events range).
Calibration data index class decides whether the data must be loaded. For instance, it may consider the event number and match it with certain validity period.
Calibration data handle is a simple wrapper class that keeps an instance of data that must be updated.
Calibration data dispatcher is the instance of special
calib::Dispatcher
class that, upon certain conditions, based on event ID and collection of indexes may cause update of calibration data of certain types. Practically, this is done by “notifying” calibration data handles.Calibration data manager (
calib::Manager
) instance object provides unified access to current calibration state. It subclasses the dispatcher and maintains collections of loaders and indexes to define which handle must be updated and when.
Resulting system built on these classes is still quite generic and foresees some sophisticated usecases like keeping the run indexes at the database, overriding it with configuration file and involve a mixed loading scheme where the data may be retrieved from ASCII CSV file, local database, ROOT file or fetched with RESTful API.
Scope of definitions¶
One have to make a distinction between calibration data type introduced for one (local) or for multiple endpoints (global or shared). This difference mostly affects how and where the data type will be declared.
Local calibration data types is an important usecase for prototyping – when
one does not want to burden global namespace and registries with definitions
that make no sense without certain handler. This is a good choice for
highly specialized handlers that require extra data. This way one can keep
all the datatype definitions within the same implementation (.cc
) file
where the handler itself is defined. Somewhat similar to C/C++ static
definitions.
If calibration functionality is shared across multiple handlers within certain extension it can be defined in the reentrant header.
Finally, a widely used runtme-dependant information can be defined at the level
of na64calib
library (requires project rebuild).
Introducing new data type¶
One have to express calibration data piece as a C/C++ data. It is usually something like a POD structure. For instance, let’s consider a typcial placement datum that describes spatial position, orientation, size etc. for a tracking detector.
struct Placement {
std::string name; ///< name of the plane
float center[3] ///< global coordinates of the plane, cm
, size[3] ///< size of the plane, cm
, rot[3] ///< Euler angles of the plane defining its orientation
;
int nWires; ///< number of wires
float resolution; ///< detector resolution
};
This way to make handler to subscribe to this kind of information a simplest
way would be to subclass a calib::Handle<>
. Taking into account that update
on that type information is usually provided as a set of such objects, it would
be convenient to subscribe on the set itself instead of individual entry (let
it be an std::list
for instance).
Subscribing to updates¶
For instance, a handler using the resolution information may have a cache designed to keep topical info:
typedef std::list<Placement> Placements;
class MyHandler : public AbstractHandler
, public calib::Handle<Placements> {
protected:
std::map<std::string, float> _resolutionDict;
protected:
void handle_update( const Placements & updates ) {
for( const auto & update : updates ) {
_resolutionDict[update.name] = update.resolution;
}
}
public:
ProcRes process_event(Event & event) {
// ... process event relying on `_resolutionDict` here
return kOk;
}
MyHandler( calib::Manager & mgr )
: calib::Handle<Placements>("default", mgr)
{}
};
Additional string provided to calib::Handle<>
constructor is used together
with type RTTI information to uniquely identify the subclass of the
calibration information. It was introduced to make a distinction between
same C/C++ type. For instance, one may like to have a spatical vector (say
TVector3
) as calibration info instance, but one still have to address it
with some additional semantics. Most of the time this additional string
classifier can be left to "default"
.
NA64SW designed in a way that calibration data update, whenever it is
available, will be provided before event is processed (i.e.
handle_update()
will be called prior to process_event()
once new
event has arrived).
Note on data types aliases¶
Calibration data types are indexed by means of C++ RTTI (std::type_info
hash) + some string that sometimes called calibration data type subclass.
This two things are expected to uniquely identify particular calibration
data type. However there is an often requirement in API to refer to this
ID by some human-readable string. For this purpose an aliasing technique
is introduced by global dictionary performing alias-to-ID and vice vers
conversions.
Note on Dependencies¶
Calib data types often have dependencies. Most obvious common dependency is
detector naming needed to convert stringified detector names to integer IDs.
These dependencies may be either specified in calibrations.yaml
in the
dependencies:
section (for global types) or provided in C++ code with
following directive using CIDataAliases
singleton (more suitable for
module-local types):
na64dp::calib::CIDataAliases::self().add_dependency("TargetType", "DependencyType");
This will make "DependencyType"
alias subject type to be loaded before
TargetType
alias subject type.
C++ dependency assignment is better to use with handler-local calibration data types, while YAML is better suited for non-local types.
Choosing update source¶
A matter of choice is how this update has to be provided. There are few architectural layers:
A foremost layer is manager instance (
calib::Manager
that is provided toREGISTER_HANDLER()
). This class aggregates all the subscriptions and update dispatching routines.A layer of standalone indexes and loaders. These objects are collected by manager and used to define when and what and have to be fetched and how.
An SDC module providing subsystem for local file discovery. This module has a core functions written for original
p348reco
data reconstruction routines, and has been naturally interfaced to NA64SW.
It is up to you, as code designer, to decide which particular mechanism to explot. Yet, dealing with manager directly (i.e. copying instance, subclassing, etc) almost never a good practice since it involves infrastructural changes. Introducing new index or loader would require much less changes in general API and foreseen to various meachanisms of providing data validity and data delivery (e.g. a remote database, RESTful API, etc). We consider some examples below.
Since most common usage scenario for calibration data is local to certain handler, we’ve tried to ease integration of custom data types within user’s extension.
Calibrations config as an update source¶
For small pieces of data that are hardly to change for years, one can consider
putting such an information right into a calibrations.yaml
file. This is a
main configuration file that is used to initialize the whole infrastructure.
Within this file a standaloneDocuments
YAML object is provided, listing
documents or individual parameters that are barely will be changed for years,
together with their validity intervals.
Direct parameters¶
Tiny pieces of runtime-dependant information (yet semi-constant), like name of
the master counter in a setup can be listed right in this file. To utilize
this document as it is provided, one may set a payload
to YAML object
needed and specify loader
to be a yaml-doc
.
Todo
Describe C/C++ routine here.
See the masterTimeSource
definitions for instance.
Static files¶
For low-volatile information that hardly to be updated or modified, but which
is too big to put the whole thing into calibrations.yaml
, one may consider
using a reference to static files.
Todo
Elaborate, describe C/C++ routine here.
See the MuMegaLayout
for instance.
SDC data as an update source¶
SDC module is useful for high-volatile, relatively bulky information that can be expressed in CSV files – a classic example of calibration data.
Its indexes and metadata is provided together with CSV data blocks. The particular type is specified within the file itself as well.
Despite SDC has customizable grammar, it is till possible to describe its basics:
blank lines are ignored
comments start with a single char, an octothorp (
#
) by default. Rest of the line after this char is ignoredA line containing a specail metadata delimiter is considered as one having a metadata (default is equal sign,
=
). String before this marker is considered as metadata entry key, string after (and before newline) – as valueAn SDC metadata is a simple a key/value (str-to-str) storage
Two key of metadata have a special meaning:
one designated to specify validity period is called metadata key tag (
runs
by default)another that shall provide data type is called metadata type tag (
type
by default)
A data itself is provided in a columnar format, similar to space-separated CSV. Any non-blank, non-metadata lines are considered as data lines
Data lines form a data block which is considered to have any previously defined metadata valid
For instance:
# This is a comment
runs=123-315
type=MyType
foo 123 3.4e12
bar 712 nan
myMetadata=blah #blah
runs=316-512
foo 321 1e-12
bar 314 512
Will define two data blocks, both of the same type but for different periods.
Only the second block will have a metadata entry myMetadata
defined (and
set to "blah"
).
Albeit SDC files discovered automatically, some C/C++ interfacing is still needed to tell NA64SW how CSV block has to be converted into particular C/C++ data type.
Extending SDC with a data type¶
SDC subsystem traverses directory structure looking for files matching certain criteria (wildcard or size limits). Every file matching this criteria is expected to contain the SDC content – metadata and CSV blocks. These files are pre-parsed once meaning that their type tag and validity tag metadata entries will be read and this file entry will be indexed by SDC classes. Pre-parsing is generic procedure and does not need much customization, except for grammar.
Then, once update is initiated, corresponding file will be actually parsed, meaning that its metadata will be read and accumulated to a metadata cache and CSV block will be provided to C++ traits in order to turn tokens into a C/C++ structure. With these traits a parsing process gets customized.
I.e. C++ traits defining how line tokens should be converted into C/C++ struct is most meaningful extension point to introduce new calibration type. Expected traits implementation is heavily based on the advanced C++ templates technique to facilitate extensibility of the SDC system.
Example traits to load the Placement
entry (snippet is a bit lengthy, but
useful):
// This traits must be defined in the "sdc" namespace
namespace sdc {
// Traits are C++ template specification of CalibDataTraits<> template
// struct -- so this syntax is mandatory
template<>
struct CalibDataTraits<Placement> {
// Type name alias, used to refer to this type in configuration files,
// mandatory constexpr
static constexpr auto typeName = "placement";
/// A collection type of the parsed entries (usually STL container, but
/// one can define any template parameterised with a single type)
template<typename T=na64dp::calib::Placement>
using Collection=std::list<T>;
/// An action performed to put newly parsed data into collection
template<typename T=na64dp::calib::Placement>
static inline void collect( Collection<T> & col
, const T & e
, const aux::MetaInfo &
, size_t
) { col.push_back(e); }
static na64dp::calib::Placement
parse_line( const std::string & line
, size_t lineNo
, const aux::MetaInfo & m
, const std::string & filename
) {
// create object and set everything to zero
na64dp::calib::Placement obj;
// Create and validate columns order, relying on "columns" metadata
// entry
auto csv = m.get<aux::ColumnsOrder>("columns", lineNo)
.interpret(aux::tokenize(line));
// Get columns
obj.name = csv("name");
obj.center[0] = csv("x", std::nan("0"));
obj.center[1] = csv("y", std::nan("0"));
obj.center[2] = csv("z", std::nan("0"));
// ... etc -- turn tokens into data structure
}
};
}
Once update is triggered, SDC will expect parse_line()
static method to
interpret CSV string as a single entry of CalibDataTraits<T>::collection<T>
container. Source information and up-to-date metadata state is provided to this
method to assist user’s data conversion procedure.
Then SDC uses collect()
static method to put the parsed object in a
collection instance.
Collection instance is then dispatched to the subscribers.
Registering alias for SDC data¶
To register new SDC data type one has to:
Declare its alias
Impose conversion function to the SDC wrapper
Define calibration data dependencies, if any (see next paragraph)
Useful snippet to do all these things:
// define new alias
na64dp::calib::CIDataAliases::self()
.add_alias_of<Placements>("placements", "default");
// get SDC loader from manager, downcast it to proper type (SDCWrapper)
// and impose type conversion using previously defined traits
mgr.get_loader<na64dp::calib::SDCWrapper>("sdc")
.enable_sdc_type_conversion<p348reco::RunNo_t, Placement>();
// Require "naming" to be loaded before "placements"
na64dp::calib::CIDataAliases::self().add_dependency("placements", "naming");
These block must be executed before any subscription of target type is done,
i.e. putting it into handler constructor won’t work. For local extensions,
consider REGISTER_HANDLER
body, for global framework-wide changes modify
na64dp/calib/sdc-extensions.hh
header.
General use¶
In the generic case, when not SDC, nor standard means are useful for user’s calibration data, a following usage is foreseen:
Todo
Revise/restructure notes below.
Custom indexes and loaders¶
Define the calibration data type. It may be a simple type, POD struct or complex class. Single instance of this type must contain only the data needed for certain validity period.
Use (existing or implement new) loader. In case of adding new loader one have to subclass the vanilla
iCalibDataLoader
(and add the appropriate entry within run index). In more common case one may rely on a generic loader and subclassGenericLoader::AbstractCalibDataIndex
. In case of simple data trivially constructible from YAML description consider usage ofSimpleYAMLDataIndex<T>
that defines stateless, single-function conversions. For more complex or a bit more bulky data, consider subclassing fromiCSVFilesDataIndex<T, TupleTypesT ...>
– users have to only define a tuple-to-data-type conversion method in their subclass.If you have subclassed the
iCalibDataLoader
– instantiate your loader and add it to the application’s manager withManager::add_loader()
method. If, on the contrary, you have subclassed theGenericLoader::AbstractCalibDataIndex
instantiate and add the data index toGenericLoader
instance mentioning new data field (see, e.g., pipeline function). Defining a runtime data loader constructor for YAML is somewhat tricky (a small intermediate structure is used) – refer tocalib::CalibInfoCtr
’s docs.
Usage for external applications¶
The central object that user’s app must maintain is an isntance of calibration
data manger: na64dp::calib::Manager
object. It might be instantiated with no
additional preparations:
na64dp::calib::Manager mgr;
Having this object, user’s classes may then subscribe to calibration update.
E.g., having calibration data type, say MyCalibData
and
class MyCalibDataUser
, one may utilise the subscription by implementing
the Observable<MyCalibData>::iObserver
interface, and calling
subscribe()
method of the Manager
instance:
class MyCalibDataUser : public na64dp::util::Observable<MyCalibData>::iObserver {
// ...
MyCalibDataUser( na64dp::calib::Manager & mgr ) {
mgr.subscribe<MyCalibData>(*this, "default");
}
// ...
protected:
virtual void handle_update( const MyCalibData & ) override;
// ...
};
Here subscribe()
takes a string argument ("default"
in example) that,
together with C++ type info, identifies the type of calibration entry.
Note
calib::Handle<>
subclasses util::Observable<T>::iObserver
to
provide a cached copy of data and some useful shortcuts for data
retrieval.
Note that the calibration data user class has only one way to recieve the
updated data instance – a protected method handle_update()
. This reference
is guaranteed to be valid until next handle_update()
call, so
subscriber may safely cache it.
To make the Manager
isntance do something meaningful, one have to bind the
run index instance.
Range-overriding run index¶
The RangeOverrideRunIndex
class implements a straightforward way of
mapping the event identifier over a subsets of calibration sources. In this
class we assume that certain calibration data object is valid starting from
some event ID till the next event ID that has another calibration data
object. This way a data signed with event ID 5 will override
(or substitute) the entity signed with event ID 2. It is the simplest yet
useful mapping scheme to define range matching.
Generic Loader¶
Most of the practical usecases do not require a level of generalization that
vanilla iCalibDataLoader
offers (loading the data of any type for any event).
The GenericLoader
class implements a calibration data loader with dynamic
composition. Once loading data of certain type is requested (by load_data()
method), it forwards execution to one of its data index instances
(subclassing GenericLoader::AbstractCalibDataIndex
) that performs actual
loading. The “data index” instance is specific for certain data type while
GenericLoader
is not (so GenericLoader
instance is just a collection
of type-specific loaders) and that is the reason for this level of
composition.
New instances of data indexes are added by GenericLoader::add_data_index()
at the startup.
YAML Calibration Info¶
The iYAMLDataIndex<T>
, a template subclass of
GenericLoader::add_data_index()
represents data index where information
needed to load calibration data for particular run is stored as YAML node (in
RAM). It might be either a entire input neede to construct a new calibration data
object (see SimpleYAMLDataIndex<T>
), or just some description needed to load
a relatevely large pice of data (iCSVFilesDataIndex
).