LPC meeting summary 06-02-2017 - final
Main purpose of the meeting: During 2016 some shortcomings on how we handle the Massi files have become evident. This meeting should summarise the current shortcomings, list the requirements from users and possibly discuss ways to improve the current situation.
Christoph summarised the current specification of the Massi files and what the various experiments provide in practice. He presented some statistical data on Massi files in 2016.
Massi files contain information for three different fields:
The files should contain at any point in time the best possible data of the experiment. This means for example that if an experiment improves some relevant calibration or if for some fills a more precise measurement becomes available some time after the fill (i.e. measurements based on offline analysis) the Massi files for the relevant files are updated by the experiment to reflect these improvements.
Christoph listed some shortcomings in the way the files are currently handled and which have become evident during 2016:
In a discussion with Reyes Alemany Fernandez and the logging DB experts from LHC it became obvious that the current system is not very well suite for data which is frequently updated. However, a new system is under development which will provide the possibility for updating data. Once this service will be in place it will be considered as a possibility to maintain the Massi file data. However it has to be discussed who will be responsible for inserting the data into the system and the maintenance of the relevant software.
Finally a list of features for the improved system was presented:
After the discussion Christoph presented a proposal which could solve some of the issues mentioned above. The technical issues for long term storage and maintenance of the data are not addressed by this proposal.
David Stickland presented CMS view on the handling of the files and what possible improvements should contain. He underlined that it would be important to know the requirements of the users to improve the system.
David then described the process used by CMS to generate the Massi files and its updates. Immediately after a fill a first version of the files based on luminosity online numbers is made available. It is possible for CMS to update these data after approximately 24h with a corrected version checked experts and having incorporated some first corrections. The data then could also contain information on the luminous region. About a month later CMS can update the data based on additional information coming from the offline pixel luminosity measurements. In addition CMS might change the calibration some time(s) over the year which could lead to reprocessing of the data for the entire year and would involve another update.
David pointed out that currently an analysis cannot be locked to a particular version of the data since no version numbers are being communicated. For CMS a three level versioning could be adequate (online, offline, development).
David pointed out the contradicting requirements for stability on one side and best possible data quality on the other side.
He also suggested that the luminosity related data and the measurements of the luminous region are completely independent in the experiment and CMS therefore would prefer to have independent directories and archive files for these data. He also suggested to have data split into different directories for different years at the side of the experiments (they are already split like this in the central data directories on the LPC side).
David then questioned if the current archive files are really the best way to store/transfer these data considering the processing time to parse them.
The LPC (Christoph) should produce an updated version of the Massi file specification (now on the LPC web page) according to the results of this meeting and circulate that version for approval. This document should become the basis of how to handle the Massi files from 2017 onwards (the data from previous years will not be required to be changed according to this proposal, however they might be updated of course)
In particular some possible misunderstandings in the current specifications should be removed.
the new specification should contain the following features:
will be provided in separate compressed archive files and will be maintained in separate directories (the processes to generate these data are completely independent in the experiments)