It is not at all straightforward to make a suitable standard model background sample for a given black box!!! Here are just a few of the issues.
All of the largest backgrounds at LHC involve QCD physics, either in full or in part. For instance, for a signal in which events containing a single lepton plus jets play a part, the dominant background is often from a W boson produced in conjunction with jets. Fine — this is a standard model process — why not just simulate the production of W bosons plus various numbers of quarks and gluons, and be done with it?
The problem is that it isn’t possible. Simulating this process, or rather, set of processes to a satisfactory degree is beyond the state of the art.
One problem is that we simply don’t know how to generate these events with good accuracy. Consider the W plus 4 gluons process... we can calculate the Feynman diagrams using various event generators. It has a rate of order (alpha_s)^4. But which alpha_s? It’s a running coupling, and at tree level there’s simply no control over the scale \mu at which it should be evaluated. We need the next-to-leading order process to be calculated also, in order to reduce the dependence on the choice of the scale \mu. This hasn’t been done; it involves a very non-trivial set of loop graphs, which have not yet been calculated. (W plus three jets is at the cutting edge.) Consequently, we can only guess at the best choice of \mu, and thus can only guess at the appropriate alpha_s. Again, alpha_s appears to the fourth power in the rate. So the rate for this one process is only known to a factor of 2 or 3 or so.
This is only the beginning. To produce the background properly, we would need to combine the many W plus one jet, W plus two jet, W plus three jet, W plus four jet, W plus five jet processes. Loop corrections have only been performed completely for W plus two jets; beyond this point, each of the individual W plus n-jet processes has its own uncertainties, of order a factor of 2 or 3, so the sum of the rates is very uncertain. More jets means more powers of alpha_s, which reduces the absolute rate but increases the relative uncertainty. And combining these processes in a consistent way, without double-counting events or incorrectly mixing orders in perturbation theory, is not trivial.
Another problem is purely technical and would be present even if our perturbative knowledge of W plus jets were perfect. There are so many W’s produced at the LHC — hundreds each second — that most of them have to be thrown away using the triggering system. By contrast, a typical new signal might have a few thousands or tens of thousands of events per year. If an important part of the signal involves a lepton and many jets, we will probably have to impose impose hard cuts — i.e., impose strict kinematic conditions on the events — that preserve much of the signal while discarding almost all of the standard model background. What standard model background remains (which may still be much larger than the signal) will be a tiny fraction of the W plus jets events that LHC actually produces, and will lie far out on the tail of any standard model kinematic distribution.
In this context, how should we provide LHC Olympics participants with this particular background? Practically, we couldn’t possibly provide the full W-plus-jets background, since we are talking about data sets which are 1000 to 100,000 times larger than the signal data sets. But suppose, knowing each signal and its characteristic features, we imposed some cuts in advance, in order to reduce the W-plus-jets data sets down to the small fraction of the events that are the most important backgrounds to a particular signal. This would mean simulating the tails of the W-plus-jets distributions. We would still have to simulate huge data sets, of order 1000 times larger than the signal, in order to obtain these tails. This would take weeks. Also, the result for each separate process contributing to that background would be uncertain by a factor of at least 3, for the reasons mentioned above, and so the number and type of events remaining, after the stringent cuts that we would need to impose, could be wrong by as much as an order of magnitude or more.
Meanwhile, although this is the worst of the backgrounds of importance, it is hardly the only one. There are also important backgrounds from Z plus jets, top quark pairs plus jets, diboson (such as WW plus jets), and pure QCD (jets-only) backgrounds, among others.
Incidentally, a naive theorist might think one need not care about pure QCD light-quark and gluon backgrounds in a sample with leptons. But this isn’t true. Leptons can be faked, especially hadronic taus. Even fake electrons and muons, which occur rarely, are important; the number of QCD events is so extraordinarily large that their presence can often be a serious issue. Also, real leptons that are sometimes isolated are generated in decays of bottom and charm quarks, which are produced in abundance in QCD events.
Finally, even if we could calculate perfectly, in perturbation theory, the W-plus-n-quarks/gluons backgrounds, we always have to account for the fact that n quarks and gluons in a Feynman diagram does not in general equal n jets in a detector. Making sure we can model the differences successfully is highly nontrivial, involving resummation of showering effects, simulation packages and matching of those packages to data. This has to be done consistently at the one-loop level, if we want to make use of recent loop calculations of Feynman graphs; implementing this in the most important processes at LHC is still at the cutting edge, as in the ongoing MC@NLO project (Monte Carlo at Next-to-Leading Order). Then there are uncertainties that are smaller, but not unimportant, from the parton distribution functions (pdfs). For certain questions, the lack of precise knowledge about the gluon pdf and those of the charm and bottom quark can contribute to important uncertainties about backgrounds. For instance, the b-quark and c-quark content of the W-plus-jets background is not well-known, so we can’t at present know with precision the background to new signals that produce leptons in association with bottom or top quarks.
[More to be added later]
Fortunately, the experimentalists will be able to combine data and theory with a lot of cleverness to remove many of these backgrounds with some degree of accuracy. The crucial question of whether this can be done reliably, and under what circumstances, is hotly debated. The LHC Olympics are beginning to include black boxes with reasonably simulated standard model backgrounds, and these issues will be discussed in the experts’ portion of our workshops.
If these problems, which will afflict the entire LHC enterprise, both worry and interest you, feel free to contact the organizers. Many theorists are needed to help with state-of-the-art loop calculations and to help with modern event-generation related projects!
For this round of the LHC Olympics, we will soon provide a couple of black boxes that contain a new physics signal together with a semi-realistic Standard Model background. The SM background necessarily lacks some realism. Due to the abovementioned theoretical uncertainties, it is only after LHC turns on that we can really start to determine precisely how big the various SM backgrounds really are. For the LHCO, however, a first semi-quantitative analysis with background is a sufficiently challenging step. Our approach for now is to simply use Pythia plus PGS4 to generate a few of the largest SM backgrounds, leaving possible quantitative improvements to a later stage.
Since generating all Standard Model events at once is highly unpractical, Pythia allows the user to isolate various identifiable SM processes, such as: dijets, single boson plus jets, Drell-Yan, ttbar, bbbar, and leptonically decaying dibosons. Each can be run separately using the corresponding *Pythia input files. The largest contributions, in terms of overall cross-section, comes from the dijets, followed by the single boson and ttbar process.
The LHCO format comes with certain practical limitations which puts an upper limit on the size of the data files. The L2 trigger settings are used to keep the number of background events within this limit.
In the current “early release” of black box C with background, with a total integrated luminostity of 500 pb-1, the number of accepted events from each process are:
jj: 58881 DY: 11246 tt: 15465 W: 27713 Z: 8109 bb: 6235 WW: 1435 ZW: 68 ZZ: 11 Signal: 606 Total: 129769