The current olympics output differs from the previous two LHC Olympics. If you have already written software to analyze LHC Olympics “data”, this might be problematic. To help alleviate this problem, we are providing a cleaning script with the PGS distribution that will allow you to massage the data into a variety of formats that might better suit your needs.
Included is the option to use precisely the same format as the initial LHC olympics. While this might get you up and running again most quickly, we would discourage its long-term use. The old format contains less information than the new format, and this gap will only widen in the future as we begin to utilise the dummy columns. By building or utilising software that can handle the new format, you will be best able to take advantage of all the information provided.
The cleaning script is provided in the examples/CLEAN/ directory along with the PGS distribution.
To create the cleaning script, type:
make clean_output
This creates the exectuable file clean_output in the examples/CLEAN/ directory.
To modify the data file data.lhco.in, use the syntax
clean_output -flags data.lhco.in data.lhco.out
Here -flags is some combination of -first, -trigger,-muon and -old. Each flag provides a distinct functionality that will be described below.
To see how the script works, it is easiest to look at a couple of examples. Consider a sample event from the beginning of a LHCO data file titled data.lcho.in:
# typ eta phi pt jmas ntrk btag had/em dum1 dum2 0 1 3585 1 4 0.032 3.327 101.05 24.17 8.0 0.0 3.14 0.0 0.0 2 4 3.257 0.064 83.80 24.70 4.0 0.0 5.21 0.0 0.0 3 4 -1.527 1.057 36.54 17.18 7.0 0.0 0.68 0.0 0.0 4 4 -2.238 3.600 20.71 15.06 5.0 0.0 1.63 0.0 0.0 5 4 0.744 4.251 10.27 6.08 1.0 0.0 0.99 0.0 0.0 6 4 2.184 5.902 10.58 15.62 4.0 0.0 15.16 0.0 0.0 7 4 -3.378 0.726 10.52 4.82 10.0 0.0 3.00 0.0 0.0 8 4 -2.599 5.251 7.85 2.67 7.0 0.0 1.92 0.0 0.0 9 6 0.000 5.916 10.07 0.00 0.0 0.0 0.00 0.0 0.0
Running the script with this flag will strip away the first line of labels from the data file. So, running
clean_output -first data.lhco.in data.lhco.out
The above text would be written to the file data.lhco.out as :
0 1 3585 1 4 0.032 3.327 101.05 24.17 8.0 0.0 3.14 0.0 0.0 2 4 3.257 0.064 83.80 24.70 4.0 0.0 5.21 0.0 0.0 3 4 -1.527 1.057 36.54 17.18 7.0 0.0 0.68 0.0 0.0 4 4 -2.238 3.600 20.71 15.06 5.0 0.0 1.63 0.0 0.0 5 4 0.744 4.251 10.27 6.08 1.0 0.0 0.99 0.0 0.0 6 4 2.184 5.902 10.58 15.62 4.0 0.0 15.16 0.0 0.0 7 4 -3.378 0.726 10.52 4.82 10.0 0.0 3.00 0.0 0.0 8 4 -2.599 5.251 7.85 2.67 7.0 0.0 1.92 0.0 0.0 9 6 0.000 5.916 10.07 0.00 0.0 0.0 0.00 0.0 0.0
Running the script with this flag will remove the zero object from each event, and move the information contained in the trigger word to the had/em column for object 6.
clean_output -trigger data.lhco.in data.lhco.out
Turns the beginning of the file into:
# typ eta phi pt jmas ntrk btag had/em dum1 dum2 1 4 0.032 3.327 101.05 24.17 8.0 0.0 3.14 0.0 0.0 2 4 3.257 0.064 83.80 24.70 4.0 0.0 5.21 0.0 0.0 3 4 -1.527 1.057 36.54 17.18 7.0 0.0 0.68 0.0 0.0 4 4 -2.238 3.600 20.71 15.06 5.0 0.0 1.63 0.0 0.0 5 4 0.744 4.251 10.27 6.08 1.0 0.0 0.99 0.0 0.0 6 4 2.184 5.902 10.58 15.62 4.0 0.0 15.16 0.0 0.0 7 4 -3.378 0.726 10.52 4.82 10.0 0.0 3.00 0.0 0.0 8 4 -2.599 5.251 7.85 2.67 7.0 0.0 1.92 0.0 0.0 9 6 0.000 5.916 10.07 0.00 0.0 0.0 3585.00 0.0 0.0
It is possible to combine these two flags, if desired, by running clean_output -first -trigger data.lhco.in data.lhco.out
Running with the -muon flag will take any “non-isolated” muons and combine then with the nearest jet (whose identity is stored in btag column. If there is no jet in the event, the muon is just deleted.). Here, “non-isolated” means one of two things.
hadem column)hadem. It is allowed to vary from .00 to .99)
The value of this entry in the btag column is the object number of the jet that the is closest to in Delta R. In the following event, the muons are non-isolated. The first is closest to object 4. The second is closest to object 3.
0 2 3599 1 2 -1.241 1.800 24.60 0.11 -1.0 4.0 36.12 0.0 0.0 2 2 -1.127 5.880 30.77 0.11 -1.0 3.1 74.32 0.0 0.0 3 4 -1.149 5.882 65.99 11.40 12.0 0.0 4.55 0.0 0.0 4 4 -2.545 3.006 48.41 273.09 12.0 0.0 1.22 0.0 0.0 5 4 -2.792 3.629 28.28 15.43 11.0 0.0 3.37 0.0 0.0 6 4 3.368 0.058 6.20 5.70 2.0 0.0 0.01 0.0 0.0 7 4 1.671 0.946 5.42 2.29 0.0 0.0 1.67 0.0 0.0 8 6 0.000 2.328 22.31 0.00 0.0 0.0 0.00 0.0 0.0
Now, running the script
clean_output -muon data.lhco.in data.lhc.out
gives the event:
0 2 3599 1 4 -1.142 5.881 96.76 13.80 13.1 0.0 4.55 0.0 0.0 2 4 -2.425 2.624 61.63 296.55 13.1 0.0 1.22 0.0 0.0 3 4 -2.792 3.629 28.28 15.43 11.0 0.0 3.37 0.0 0.0 4 4 3.368 0.058 6.20 5.70 2.0 0.0 0.01 0.0 0.0 5 4 1.671 0.946 5.42 2.29 0.0 0.0 1.67 0.0 0.0 6 6 0.000 2.328 22.31 0.00 0.0 0.0 0.00 0.0 0.0
Note that the muons have been deleted from the event record, and has been combined with the appropriate object. The information that jets 1 and 2 have “eaten” muons is contained in the number of tracks column for the jet. The tenths place has been incremented. If desired, one could try to utilize this information as part of a soft-lepton heavy flavor tag. The current heavy flavor tagging is based on efficiencies solely arising from vertexing information.
This flag is designed put the output back into the “old” Olympics output format. This allows backwards compatibility with analysis software that you might have written for the first two rounds. However, the long-term use of this script is discouraged because it will prevent the use of all information provided with the black boxes. To call the script, type:
clean_output -old data.lhco.in data.lhco.out
This would take
# typ eta phi pt jmas ntrk btag had/em dum1 dum2 0 5 3587 1 2 1.169 4.197 6.30 0.11 1.0 3.0 42.15 0.0 0.0 2 4 -0.121 1.278 330.12 206.58 6.0 2.0 3.50 0.0 0.0 3 4 1.207 4.216 306.56 27.99 16.0 0.0 0.73 0.0 0.0 4 4 -0.357 5.635 79.27 10.92 8.0 0.0 1.31 0.0 0.0 5 4 -0.965 4.076 17.42 7.24 3.0 0.0 0.63 0.0 0.0 6 4 -2.073 0.696 8.75 4.07 1.0 0.0 1.93 0.0 0.0 7 4 -3.717 1.975 6.81 2.30 1.0 0.0 0.15 0.0 0.0 8 6 0.000 1.926 12.42 0.00 0.0 0.0 0.00 0.0 0.0
and give you
#typ eta phi pt jmas ntrack btag 1 4 -0.121 1.278 330.12 206.58 6.0 1.0 2 4 1.206 4.216 312.86 29.22 17.0 0.0 3 4 -0.357 5.635 79.27 10.92 8.0 0.0 4 4 -0.965 4.076 17.42 7.24 3.0 0.0 5 4 -2.073 0.696 8.75 4.07 1.0 0.0 6 4 -3.717 1.975 6.81 2.30 1.0 0.0 7 6 0.000 1.926 12.42 0.00 0.0 0.0
The script eliminates the extra columns, places the lepton charge back in the jmas column, and combines unisolated muons with jets. The ntrack column is meaningless for leptons.
The -old flag can be called in concert with -first if desired. Calling it with -trigger or -muon is redundant.
If the cleaning script doesn’t put things in exactly the format that you want, hopefully you can use the Fortran source code as an example template of how to read in the data and write it back out in a different format.