Minutes and Summary

Main purpose of the meeting: During 2016 some shortcomings on how we handle the Massi files have become evident. This meeting should summarise the current shortcomings, list the requirements from users and possibly discuss ways to improve the current situation.

Introduction (Christoph Schwick)

Christoph summarised the current specification of the Massi files and what the various experiments provide in practice. He presented some statistical data on Massi files in 2016.

Massi files contain information for three different fields:

Information related to luminosity measurements in the experiment
Information related to measurements of the luminous region of the experiment
Information related to measurements of beam-gas interactions used to determine the beam profile (only provided by LHCb due to some special features in this experiment)

The files should contain at any point in time the best possible data of the experiment. This means for example that if an experiment improves some relevant calibration or if for some fills a more precise measurement becomes available some time after the fill (i.e. measurements based on offline analysis) the Massi files for the relevant files are updated by the experiment to reflect these improvements.

Christoph listed some shortcomings in the way the files are currently handled and which have become evident during 2016:

There is currently no mechanism in place which can be used by the providers of the Massi files (the experiments) to communicate to the users of the files, what the files contain at any point in time. For example if an experiment improves regularly the contents of the Massi files some time after the fill due to the availability of offline measurements with higher precision, there is not way for a user to know if a file contains already these more precise data or if the contents is still based on some more coarse online data. This might lead to confusion when trying to do analyse subtle effects as it is often done by the LHC experts.
In case of problems in a fill (or a set of fills) there is no mechanism for the experiments to communicate this to the users fo the Massi files.
Further it was noted that the data of the Massi files in not versioned and it was questioned if multiple versions need to be maintained centrally by the LPC or if only the best known data need to be made available.
Currently the data is stored in files on the afs which will be de-commissioned by IT. This opens the question on how to maintain and preserve the data in future.

In a discussion with Reyes Alemany Fernandez and the logging DB experts from LHC it became obvious that the current system is not very well suite for data which is frequently updated. However, a new system is under development which will provide the possibility for updating data. Once this service will be in place it will be considered as a possibility to maintain the Massi file data. However it has to be discussed who will be responsible for inserting the data into the system and the maintenance of the relevant software.

Finally a list of features for the improved system was presented:

The improvements should imply minor changes to the current system in place in order to not create a large workload to experiments or users of the data.
The system should provide a mechanism to annotate the data with meta-data like version numbers and human readable text.
The system should allow for filtering of the data via scripts.
The system should allow a webserver to present the meta-data on a web page for consultation of the users.

After the discussion Christoph presented a proposal which could solve some of the issues mentioned above. The technical issues for long term storage and maintenance of the data are not addressed by this proposal.

Thoughts from CMS (David Stickland)

David Stickland presented CMS view on the handling of the files and what possible improvements should contain. He underlined that it would be important to know the requirements of the users to improve the system.

David then described the process used by CMS to generate the Massi files and its updates. Immediately after a fill a first version of the files based on luminosity online numbers is made available. It is possible for CMS to update these data after approximately 24h with a corrected version checked experts and having incorporated some first corrections. The data then could also contain information on the luminous region. About a month later CMS can update the data based on additional information coming from the offline pixel luminosity measurements. In addition CMS might change the calibration some time(s) over the year which could lead to reprocessing of the data for the entire year and would involve another update.

David pointed out that currently an analysis cannot be locked to a particular version of the data since no version numbers are being communicated. For CMS a three level versioning could be adequate (online, offline, development).

David pointed out the contradicting requirements for stability on one side and best possible data quality on the other side.

He also suggested that the luminosity related data and the measurements of the luminous region are completely independent in the experiment and CMS therefore would prefer to have independent directories and archive files for these data. He also suggested to have data split into different directories for different years at the side of the experiments (they are already split like this in the central data directories on the LPC side).

David then questioned if the current archive files are really the best way to store/transfer these data considering the processing time to parse them.

Discussion and decisions (all)

The LPC (Christoph) should produce an updated version of the Massi file specification (now on the LPC web page) according to the results of this meeting and circulate that version for approval. This document should become the basis of how to handle the Massi files from 2017 onwards (the data from previous years will not be required to be changed according to this proposal, however they might be updated of course)

In particular some possible misunderstandings in the current specifications should be removed.

the new specification should contain the following features:

Experiments will provide the data in separate directories for different years.
The three different types of data
- luminosity data
- luminous region data
- beam gas relate beam profile data (only LHCb)

will be provided in separate compressed archive files and will be maintained in separate directories (the processes to generate these data are completely independent in the experiments)

It has been decided that only one version of the data is kept on the central server. That version is the experiments best data set. Meta data will allow users to extract the necessary information to judge if the contents of the files for specific fills is adequat for their type of analysis. If a user would like to keep a specific version of the files it is his responsibility to copy the relevant data. If desired by the experiment, a history of the different versions (or a fraction of these) will be kept by the experiment.
The new version of the specification will contain the proposal of annotating the Massi files with meta-data.
The LPC (Christoph) will provide a web page on the LPC site which will display the meta-data for the Massi files on a web page which will be updated automatically.