ATLAS Offline Software
Testing the Code

Notes from Feb 2010

The easiest way to run the monitoring nowadays is probably just to run the transform used for production at Tier0 - Reco_trf. You have to run this anyway before you are allowed to submit code to the Tier0 cache. The advantage of this is that it almost always works out of the box in any recent release and you are running it exactly as it will be run on Tier0. The disadvantage is that by default you are running everything so it takes longer to run. You can turn things off to speed it up but often the same jobOptions won't work when you move from one release to the next so I don't bother trying any more.

Example use of Reco_trf:

Reco_trf.py inputBSFile=rawDataFilename outputESDFile=myESD.pool.root \ outputAODFile=myAOD.pool.root HIST=myHIST.root \ autoConfiguration=everything maxEvents=200

This runs the complete Tier0 processing chain. The important parts for us are RAWtoESD, ESDtoAOD and Histogram Merging. The latest version of TrigT1CaloMonitoring runs some monitoring in the RAWtoESD step and some in the ESDtoAOD step which is then merged together. Older versions ran everything in the RAWtoESD step but we have been asked to move as much as possible to the ESDtoAOD step. I've never tried running on the Grid as it's not necessary for testing. I usually use our batch farm here at Birmingham. I recommend using the latest release possible, I'm using AtlasProduction-15.6.4.1 which is the current Tier0 release, I believe. The latest tags of TrigT1CaloMonitoring and TrigT1Monitoring will work with this.


Update Feb 2013

The monitoring has been moved back to the RAWtoESD step to avoid reading a large database folder in both steps. But note that the jobOptions are still called in both steps so still need to cater for both.

Latest suggested test job:

Reco_trf.py inputBSFile=rawDataFilename --ignoreerrors=True conditionsTag='COMCOND-ES1PA-006-05' \
autoConfiguration='everything' maxEvents=200 outputESDFile=myESD.pool.root \
--omitvalidation=ALL --test outputHISTFile=myHIST.root

You can find out the current version and job being run on Tier0 by looking on the DQ web pages for Tier0 monitoring. If you click on the tag next to the run number it will give you various information including the Atlas release used. To get the actual job parameters use GetCommand.py:

GetCommand.py AMI=x250

where x250 is the first part of the tag on the DQ page. You may need to do:

voms-proxy-init -voms atlas

first to access AMI.

Before requesting a tag for Tier0 you should test with the latest cache or nightly and run these three jobs:

Reco_trf.py AMI=q120
Reco_trf.py AMI=q121
Reco_trf.py AMI=q122

If you are running these jobs in an environment that can't access AMI then use GetCommand.py to get the job parameters you need. Check the outputs carefully particularly for the RAWtoESD step.


Testing Online-specific Code

The tools which contain online-specific code have a property OnlineTest which if set to true makes the tool run as if it was online even when offline. (Exception: PPrStabilityMon.)


Monitoring CPU Time

For Tier0 monitoring it is important to keep CPU and memory usage as low as possible. To help with this an alternative jobOptions is provided which runs every L1Calo monitoring tool in a separate manager so that the CPU usage of each tool is given at the end of the Reco_trf.py job log. See TrigT1CaloMonitoring_forRecExCommission_cpu.py (and TrigT1Monitoring_forRecExCommission_cpu.py for TrigT1Monitoring).

The following table shows the cpu usage of each tool as a percentage of the total L1Calo cpu. The express stream runs all tools so gives times for all of them. The overall column estimates the contribution of each tool for all streams (ES1 and BLK) taking into account numbers of events and which streams the tools run in. Run 215643 and release 17.7.0.2 together with TrigT1CaloByteStream-00-08-17, TrigT1CaloMonitoring-00-14-06, TrigT1CaloMonitoringTools-00-02-01, TrigT1Monitoring-00-05-00 and TrigT1CaloCalibTools-00-05-14 were used for this.

Manager Tool(s) % cpu
express
% cpu
overall
L1CaloMonManager0A1 Bytestream Unpacking PPM (1)
6.7
12.0
L1CaloMonManager0A2 Bytestream Unpacking CPM (1)
1.2
2.2
L1CaloMonManager0A3 Bytestream Unpacking JEM (1)
1.2
2.2
L1CaloMonManager0A4 Bytestream Unpacking ROD (1)
0.2
0.4
L1CaloMonManager0B L1CaloMonitoringCaloTool (2)
31.0
18.5
L1CaloMonManager1A PPrStabilityMon /FineTime
2.6
0.1
L1CaloMonManager1B PPrStabilityMon /Pedestal
5.2
0.3
L1CaloMonManager1C PPrStabilityMon /EtCorrelation
0.8
0.0
L1CaloMonManager2 PPrMon
1.6
2.8
L1CaloMonManager3 PPMSimBSMon
3.5
6.3
L1CaloMonManager4 PPrSpareMon
0.3
0.5
L1CaloMonManager5 JEMMon
0.5
0.9
L1CaloMonManager6 CMMMon
0.2
0.3
L1CaloMonManager7 JEPSimBSMon
14.4
25.8
L1CaloMonManager8 TrigT1CaloCpmMonTool
0.6
1.1
L1CaloMonManager9 CPMSimBSMon
4.0
7.2
L1CaloMonManagerA TrigT1CaloRodMonTool
0.2
0.3
L1CaloMonManagerB TrigT1CaloGlobalMonTool
0.1
0.3
L1CaloMonManagerC EmEfficienciesMonTool
5.4
5.3
L1CaloMonManagerD JetEfficienciesMonTool
3.6
3.0
L1MonManager0A (3) CalorimeterL1CaloMon
15.1
9.0
L1MonManager0B (3) L1CaloHVScalesMon (4)
1.0
0.6
L1MonManager0C (3) L1CaloPMTScoresMon (4)
0.1
0.1
L1MonManager1 (3) L1CaloCTPMon
0.4
0.7
L1MonManager2 (3) L1CaloLevel2Mon
0.1
0.2

(1) Needs to run before any other algorithms that may be reading our data, eg RoIBResultToAOD.
(2) This tool forms CaloCell Et sums and quality per TriggerTower for the use of other tools.
(3) TrigT1Monitoring.
(4) Runs first event of each job only.

To get the cpu times from the job log do:

grep 'L1' job.log | grep 'MonManager' | grep 'execute' | sort > cpu.log

The numbers in the table were generated with this program:

#include <iostream>
#include <iomanip>
int main()
{
int ntools = 25;
// relative cpu times for each tool in express stream (from job log)
float timesE[] = {1.77,0.324,0.318,0.065,8.16,0.673,1.36,0.222,0.420,
0.923,0.076,0.128,0.041,3.8,0.161,1.06,0.048,0.039,
1.41,0.943,3.98,0.253,0.026,0.105,0.03};
// flag which tools run in each stream (as in jobOptions)
int express[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
int jetet[] = {1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1};
int egamma[] = {1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,1,0,0,0,1,1};
int muons[] = {1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1};
int other[] = {1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,1};
// relative number of events per stream
// (from plots L1Calo/Overview/l1calo_1d_NumberOfEvents)
float events[] = {0.192, 2.11, 1.53, 1.52, 1.53};
float timesO[ntools];
float totalE = 0;
float totalO = 0;
for (int i = 0; i < ntools; ++i) {
totalE += timesE[i];
timesO[i] = express[i]*timesE[i]*events[0] + jetet[i]*timesE[i]*events[1] +
egamma[i]*timesE[i]*events[2] + muons[i]*timesE[i]*events[3] +
other[i]*timesE[i]*events[4];
totalO += timesO[i];
}
float percE, percO;
std::cout << "Express Overall" << std::endl;
for (int i = 0; i < ntools; ++i) {
percE = 100*timesE[i]/totalE;
percO = 100*timesO[i]/totalO;
std::cout << std::setiosflags(std::ios::fixed | std::ios::showpoint)
<< std::setprecision(1)
<< std::setw(6) << percE
<< std::setw(9) << percO << std::endl;
}
}

Times are for one input file (683 events).

StateLessPT_NewConfig.proxy
proxy
Definition: StateLessPT_NewConfig.py:392
True
#define True
Definition: ALFA_SvdCalc.h:37
ALL
@ ALL
Definition: sTGCenumeration.h:14
TrigInDetValidation_Base.test
test
Definition: TrigInDetValidation_Base.py:147
python.DataFormatRates.events
events
Definition: DataFormatRates.py:105
std::sort
void sort(typename DataModel_detail::iterator< DVL > beg, typename DataModel_detail::iterator< DVL > end)
Specialization of sort for DataVector/List.
Definition: DVL_algorithms.h:554
egamma
Definition: egamma.h:58
python.SCT_ByteStreamErrorsTestAlgConfig.maxEvents
maxEvents
Definition: SCT_ByteStreamErrorsTestAlgConfig.py:43
main
int main(int, char **)
Main class for all the CppUnit test classes
Definition: CppUnit_SGtestdriver.cxx:141
lumiFormat.i
int i
Definition: lumiFormat.py:85
python.PyKernel.init
def init(v_theApp, v_rootStream=None)
Definition: PyKernel.py:45
InDetDD::other
@ other
Definition: InDetDD_Defs.h:16
test_interactive_athena.job
job
Definition: test_interactive_athena.py:6