Deprecated: Assigning the return value of new by reference is deprecated in /Users/web_old/olympicswiki/inc/parserutils.php on line 219

Deprecated: Assigning the return value of new by reference is deprecated in /Users/web_old/olympicswiki/inc/parserutils.php on line 222

Deprecated: Assigning the return value of new by reference is deprecated in /Users/web_old/olympicswiki/inc/parserutils.php on line 359

Deprecated: Function split() is deprecated in /Users/web_old/olympicswiki/inc/common.php on line 798

Warning: Cannot modify header information - headers already sent by (output started at /Users/web_old/olympicswiki/inc/parserutils.php:219) in /Users/web_old/olympicswiki/inc/actions.php on line 102
lhc_olympics:cleaning_script [LHC Olympics]
Warning: date() [function.date]: It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'America/New_York' for 'EDT/-4.0/DST' instead in /Users/web_old/olympicswiki/inc/template.php on line 187
 

Reasons for the Cleaning Script

The current olympics output differs from the previous two LHC Olympics. If you have already written software to analyze LHC Olympics “data”, this might be problematic. To help alleviate this problem, we are providing a cleaning script with the PGS distribution that will allow you to massage the data into a variety of formats that might better suit your needs.

Included is the option to use precisely the same format as the initial LHC olympics. While this might get you up and running again most quickly, we would discourage its long-term use. The old format contains less information than the new format, and this gap will only widen in the future as we begin to utilise the dummy columns. By building or utilising software that can handle the new format, you will be best able to take advantage of all the information provided.

Obtaining the Cleaning Script

The cleaning script is provided in the examples/CLEAN/ directory along with the PGS distribution.

To create the cleaning script, type:

make clean_output

This creates the exectuable file clean_output in the examples/CLEAN/ directory.

Syntax

To modify the data file data.lhco.in, use the syntax

clean_output -flags data.lhco.in data.lhco.out

Here -flags is some combination of -first, -trigger,-muon and -old. Each flag provides a distinct functionality that will be described below.

Examples

To see how the script works, it is easiest to look at a couple of examples. Consider a sample event from the beginning of a LHCO data file titled data.lcho.in:

  #  typ      eta    phi      pt    jmas  ntrk  btag   had/em  dum1  dum2
  0             1   3585
  1    4    0.032  3.327  101.05   24.17   8.0   0.0     3.14   0.0   0.0
  2    4    3.257  0.064   83.80   24.70   4.0   0.0     5.21   0.0   0.0
  3    4   -1.527  1.057   36.54   17.18   7.0   0.0     0.68   0.0   0.0
  4    4   -2.238  3.600   20.71   15.06   5.0   0.0     1.63   0.0   0.0
  5    4    0.744  4.251   10.27    6.08   1.0   0.0     0.99   0.0   0.0
  6    4    2.184  5.902   10.58   15.62   4.0   0.0    15.16   0.0   0.0
  7    4   -3.378  0.726   10.52    4.82  10.0   0.0     3.00   0.0   0.0
  8    4   -2.599  5.251    7.85    2.67   7.0   0.0     1.92   0.0   0.0
  9    6    0.000  5.916   10.07    0.00   0.0   0.0     0.00   0.0   0.0

-first

Running the script with this flag will strip away the first line of labels from the data file. So, running

clean_output -first data.lhco.in data.lhco.out 

The above text would be written to the file data.lhco.out as :

  0             1   3585
  1    4    0.032  3.327  101.05   24.17   8.0   0.0     3.14   0.0   0.0
  2    4    3.257  0.064   83.80   24.70   4.0   0.0     5.21   0.0   0.0
  3    4   -1.527  1.057   36.54   17.18   7.0   0.0     0.68   0.0   0.0
  4    4   -2.238  3.600   20.71   15.06   5.0   0.0     1.63   0.0   0.0
  5    4    0.744  4.251   10.27    6.08   1.0   0.0     0.99   0.0   0.0
  6    4    2.184  5.902   10.58   15.62   4.0   0.0    15.16   0.0   0.0
  7    4   -3.378  0.726   10.52    4.82  10.0   0.0     3.00   0.0   0.0
  8    4   -2.599  5.251    7.85    2.67   7.0   0.0     1.92   0.0   0.0
  9    6    0.000  5.916   10.07    0.00   0.0   0.0     0.00   0.0   0.0

-trigger

Running the script with this flag will remove the zero object from each event, and move the information contained in the trigger word to the had/em column for object 6.

clean_output -trigger data.lhco.in data.lhco.out

Turns the beginning of the file into:

  #  typ      eta    phi      pt    jmas  ntrk  btag   had/em  dum1  dum2       
  1    4    0.032  3.327  101.05   24.17   8.0   0.0     3.14   0.0   0.0
  2    4    3.257  0.064   83.80   24.70   4.0   0.0     5.21   0.0   0.0
  3    4   -1.527  1.057   36.54   17.18   7.0   0.0     0.68   0.0   0.0
  4    4   -2.238  3.600   20.71   15.06   5.0   0.0     1.63   0.0   0.0
  5    4    0.744  4.251   10.27    6.08   1.0   0.0     0.99   0.0   0.0
  6    4    2.184  5.902   10.58   15.62   4.0   0.0    15.16   0.0   0.0
  7    4   -3.378  0.726   10.52    4.82  10.0   0.0     3.00   0.0   0.0
  8    4   -2.599  5.251    7.85    2.67   7.0   0.0     1.92   0.0   0.0
  9    6    0.000  5.916   10.07    0.00   0.0   0.0  3585.00   0.0   0.0

It is possible to combine these two flags, if desired, by running clean_output -first -trigger data.lhco.in data.lhco.out

-muon

Running with the -muon flag will take any “non-isolated” muons and combine then with the nearest jet (whose identity is stored in btag column. If there is no jet in the event, the muon is just deleted.). Here, “non-isolated” means one of two things.

  • If ptiso, the summed pT in a R=0.4 cone around the muon (excluding the muon itself), is ptiso > 5.0 GeV (ptiso is stored in the whole number in the hadem column)
  • If etrat, the ratio of ET in a 3×3 calorimeter array around the muon (including the muon’s cell) to the pT of the muon is etrat > 0.1. (etrat is stored to the right of the decimal place in hadem. It is allowed to vary from .00 to .99)

The value of this entry in the btag column is the object number of the jet that the is closest to in Delta R. In the following event, the muons are non-isolated. The first is closest to object 4. The second is closest to object 3.

  0             2   3599
  1    2   -1.241  1.800   24.60    0.11  -1.0   4.0    36.12   0.0   0.0
  2    2   -1.127  5.880   30.77    0.11  -1.0   3.1    74.32   0.0   0.0
  3    4   -1.149  5.882   65.99   11.40  12.0   0.0     4.55   0.0   0.0
  4    4   -2.545  3.006   48.41  273.09  12.0   0.0     1.22   0.0   0.0
  5    4   -2.792  3.629   28.28   15.43  11.0   0.0     3.37   0.0   0.0
  6    4    3.368  0.058    6.20    5.70   2.0   0.0     0.01   0.0   0.0
  7    4    1.671  0.946    5.42    2.29   0.0   0.0     1.67   0.0   0.0
  8    6    0.000  2.328   22.31    0.00   0.0   0.0     0.00   0.0   0.0

Now, running the script

clean_output -muon data.lhco.in data.lhc.out

gives the event:

  0             2   3599
  1    4   -1.142  5.881   96.76   13.80  13.1   0.0     4.55   0.0   0.0
  2    4   -2.425  2.624   61.63  296.55  13.1   0.0     1.22   0.0   0.0
  3    4   -2.792  3.629   28.28   15.43  11.0   0.0     3.37   0.0   0.0
  4    4    3.368  0.058    6.20    5.70   2.0   0.0     0.01   0.0   0.0
  5    4    1.671  0.946    5.42    2.29   0.0   0.0     1.67   0.0   0.0
  6    6    0.000  2.328   22.31    0.00   0.0   0.0     0.00   0.0   0.0
 

Note that the muons have been deleted from the event record, and has been combined with the appropriate object. The information that jets 1 and 2 have “eaten” muons is contained in the number of tracks column for the jet. The tenths place has been incremented. If desired, one could try to utilize this information as part of a soft-lepton heavy flavor tag. The current heavy flavor tagging is based on efficiencies solely arising from vertexing information.

-old

This flag is designed put the output back into the “old” Olympics output format. This allows backwards compatibility with analysis software that you might have written for the first two rounds. However, the long-term use of this script is discouraged because it will prevent the use of all information provided with the black boxes. To call the script, type:

clean_output -old data.lhco.in data.lhco.out

This would take

  #  typ      eta    phi      pt    jmas  ntrk  btag   had/em  dum1  dum2
  0             5   3587
  1    2    1.169  4.197    6.30    0.11   1.0   3.0    42.15   0.0   0.0
  2    4   -0.121  1.278  330.12  206.58   6.0   2.0     3.50   0.0   0.0
  3    4    1.207  4.216  306.56   27.99  16.0   0.0     0.73   0.0   0.0
  4    4   -0.357  5.635   79.27   10.92   8.0   0.0     1.31   0.0   0.0
  5    4   -0.965  4.076   17.42    7.24   3.0   0.0     0.63   0.0   0.0
  6    4   -2.073  0.696    8.75    4.07   1.0   0.0     1.93   0.0   0.0
  7    4   -3.717  1.975    6.81    2.30   1.0   0.0     0.15   0.0   0.0
  8    6    0.000  1.926   12.42    0.00   0.0   0.0     0.00   0.0   0.0

and give you

   #typ      eta    phi        pt     jmas ntrack   btag
   1  4   -0.121  1.278    330.12   206.58    6.0    1.0
   2  4    1.206  4.216    312.86    29.22   17.0    0.0
   3  4   -0.357  5.635     79.27    10.92    8.0    0.0
   4  4   -0.965  4.076     17.42     7.24    3.0    0.0
   5  4   -2.073  0.696      8.75     4.07    1.0    0.0
   6  4   -3.717  1.975      6.81     2.30    1.0    0.0
   7  6    0.000  1.926     12.42     0.00    0.0    0.0

The script eliminates the extra columns, places the lepton charge back in the jmas column, and combines unisolated muons with jets. The ntrack column is meaningless for leptons.

The -old flag can be called in concert with -first if desired. Calling it with -trigger or -muon is redundant.

If the cleaning script doesn’t put things in exactly the format that you want, hopefully you can use the Fortran source code as an example template of how to read in the data and write it back out in a different format.

 

Warning: date() [function.date]: It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'America/New_York' for 'EDT/-4.0/DST' instead in /Users/web_old/olympicswiki/inc/template.php on line 634
lhc_olympics/cleaning_script.txt · Last modified: 2010/01/26 10:04 by 128.141.30.119
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki