Statistical interpretation of results #10

katilp · 2020-04-17T12:06:27Z

Currenty, the final product of the analysis is a set of histograms that are interesting inspect qualitatively, but there is no way to conclude if we actually see some hint of Higgs in the data or not. After all selections, we have roughly 2500 signal events and 75000 background events, which gives a naive expected significance of 2500/sqrt(75000)=9.1 sigmas. This means that it is impossible to draw conclusions without including the systematic uncertainties (at least the dominating ones). However, it might be useful to provide a script that allows one to calculate the significance properly, and see how adding in syst. uncertainties for different processes (even if they are just ad-hoc numbers, some 10% here and 20% there) affects the significance. In practice, we would need something like this: http://dpnc.unige.ch/~sfyrla/teaching/Statistics/handsOn3.html

lukasheinrich · 2020-04-17T12:13:39Z

we could use pyhf here for a systematics-aware computation of the significance given the histograms cc @matthewfeickert @kratsg

https://github.com/scikit-hep/pyhf

kratsg · 2020-07-25T23:56:11Z

I don't know if there's been any progress here, but we're willing to help. In pyhf, one can do pip install pyhf and then do the following procedure -- which is probably a little bit more tractable for newer students to get up and running.

We will need scikit-hep/pyhf#520 for the discovery test statistic (p0).

>>> import pyhf
>>> pdf = pyhf.simplemodels.hepdata_like([2500.], [75000.], [100.])
>>> pdf
<pyhf.pdf.Model object at 0x10fa75978>
>>> pdf.config.channels
['singlechannel']
>>> pdf.config.samples
['background', 'signal']
>>> pdf.config.parameters
['mu', 'uncorr_bkguncrt']
>>> pdf.config.modifiers
[('mu', 'normfactor'), ('uncorr_bkguncrt', 'shapesys')]
>>> pdf.config.npars
2
>>> pdf.expected_data([0.0, 1.0]) # bkg-only
array([ 75000., 562500.])
>>> pdf.expected_data([1.0, 1.0]) # signal + bkg
array([ 77500., 562500.])
>>> observations = [77500] + pdf.config.auxdata
>>> observations
[77500, 562500.0]
# NB: the following is for an exclusion test statistic.... 
>>> CLs, [CLsb, CLb], CLs_expected = pyhf.infer.hypotest(1.0, observations, pdf, return_tail_probs=True, return_expected_set=True)
>>> CLs
array([0.5])
>>> CLsb
array([0.5])
>>> CLb
array([1.])
>>> CLs_expected
array([[2.18887890e-24],
       [7.52567792e-21],
       [2.12854478e-17],
       [4.20192638e-14],
       [4.49330927e-11]])

All you can say from this is that one cannot exclude the mu=1.0 hypothesis [which I suppose is a fair statement!]

caredg added this to the Complete first version for the 2021 Workshop milestone Mar 15, 2021

jmhogan mentioned this issue Oct 17, 2023

Documentation update checklist for future releases #113

Open

26 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistical interpretation of results #10

Statistical interpretation of results #10

katilp commented Apr 17, 2020

lukasheinrich commented Apr 17, 2020 •

edited

Loading

kratsg commented Jul 25, 2020

Statistical interpretation of results #10

Statistical interpretation of results #10

Comments

katilp commented Apr 17, 2020

lukasheinrich commented Apr 17, 2020 • edited Loading

kratsg commented Jul 25, 2020

lukasheinrich commented Apr 17, 2020 •

edited

Loading