ATLAS Offline Software
|
The easiest way to run the monitoring nowadays is probably just to run the transform used for production at Tier0 - Reco_trf
. You have to run this anyway before you are allowed to submit code to the Tier0 cache. The advantage of this is that it almost always works out of the box in any recent release and you are running it exactly as it will be run on Tier0. The disadvantage is that by default you are running everything so it takes longer to run. You can turn things off to speed it up but often the same jobOptions won't work when you move from one release to the next so I don't bother trying any more.
Example use of Reco_trf:
This runs the complete Tier0 processing chain. The important parts for us are RAWtoESD, ESDtoAOD and Histogram Merging. The latest version of TrigT1CaloMonitoring runs some monitoring in the RAWtoESD step and some in the ESDtoAOD step which is then merged together. Older versions ran everything in the RAWtoESD step but we have been asked to move as much as possible to the ESDtoAOD step. I've never tried running on the Grid as it's not necessary for testing. I usually use our batch farm here at Birmingham. I recommend using the latest release possible, I'm using AtlasProduction-15.6.4.1 which is the current Tier0 release, I believe. The latest tags of TrigT1CaloMonitoring and TrigT1Monitoring will work with this.
The monitoring has been moved back to the RAWtoESD step to avoid reading a large database folder in both steps. But note that the jobOptions are still called in both steps so still need to cater for both.
Latest suggested test job:
You can find out the current version and job being run on Tier0 by looking on the DQ web pages for Tier0 monitoring. If you click on the tag next to the run number it will give you various information including the Atlas release used. To get the actual job parameters use GetCommand.py:
where x250
is the first part of the tag on the DQ page. You may need to do:
first to access AMI.
Before requesting a tag for Tier0 you should test with the latest cache or nightly and run these three jobs:
If you are running these jobs in an environment that can't access AMI then use GetCommand.py
to get the job parameters you need. Check the outputs carefully particularly for the RAWtoESD step.
The tools which contain online-specific code have a property OnlineTest
which if set to true makes the tool run as if it was online even when offline. (Exception: PPrStabilityMon.)
For Tier0 monitoring it is important to keep CPU and memory usage as low as possible. To help with this an alternative jobOptions is provided which runs every L1Calo monitoring tool in a separate manager so that the CPU usage of each tool is given at the end of the Reco_trf.py
job log. See TrigT1CaloMonitoring_forRecExCommission_cpu.py (and TrigT1Monitoring_forRecExCommission_cpu.py for TrigT1Monitoring).
The following table shows the cpu usage of each tool as a percentage of the total L1Calo cpu. The express stream runs all tools so gives times for all of them. The overall column estimates the contribution of each tool for all streams (ES1 and BLK) taking into account numbers of events and which streams the tools run in. Run 215643 and release 17.7.0.2 together with TrigT1CaloByteStream-00-08-17, TrigT1CaloMonitoring-00-14-06, TrigT1CaloMonitoringTools-00-02-01, TrigT1Monitoring-00-05-00 and TrigT1CaloCalibTools-00-05-14 were used for this.
Manager | Tool(s) | % cpu express | % cpu overall |
---|---|---|---|
L1CaloMonManager0A1 | Bytestream Unpacking PPM (1) | ||
L1CaloMonManager0A2 | Bytestream Unpacking CPM (1) | ||
L1CaloMonManager0A3 | Bytestream Unpacking JEM (1) | ||
L1CaloMonManager0A4 | Bytestream Unpacking ROD (1) | ||
L1CaloMonManager0B | L1CaloMonitoringCaloTool (2) | ||
L1CaloMonManager1A | PPrStabilityMon /FineTime | ||
L1CaloMonManager1B | PPrStabilityMon /Pedestal | ||
L1CaloMonManager1C | PPrStabilityMon /EtCorrelation | ||
L1CaloMonManager2 | PPrMon | ||
L1CaloMonManager3 | PPMSimBSMon | ||
L1CaloMonManager4 | PPrSpareMon | ||
L1CaloMonManager5 | JEMMon | ||
L1CaloMonManager6 | CMMMon | ||
L1CaloMonManager7 | JEPSimBSMon | ||
L1CaloMonManager8 | TrigT1CaloCpmMonTool | ||
L1CaloMonManager9 | CPMSimBSMon | ||
L1CaloMonManagerA | TrigT1CaloRodMonTool | ||
L1CaloMonManagerB | TrigT1CaloGlobalMonTool | ||
L1CaloMonManagerC | EmEfficienciesMonTool | ||
L1CaloMonManagerD | JetEfficienciesMonTool | ||
L1MonManager0A (3) | CalorimeterL1CaloMon | ||
L1MonManager0B (3) | L1CaloHVScalesMon (4) | ||
L1MonManager0C (3) | L1CaloPMTScoresMon (4) | ||
L1MonManager1 (3) | L1CaloCTPMon | ||
L1MonManager2 (3) | L1CaloLevel2Mon |
(1) Needs to run before any other algorithms that may be reading our data, eg RoIBResultToAOD.
(2) This tool forms CaloCell Et sums and quality per TriggerTower for the use of other tools.
(3) TrigT1Monitoring.
(4) Runs first event of each job only.
To get the cpu times from the job log do:
The numbers in the table were generated with this program:
Times are for one input file (683 events).