Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistical interpretation of results #10

Open
katilp opened this issue Apr 17, 2020 · 2 comments
Open

Statistical interpretation of results #10

katilp opened this issue Apr 17, 2020 · 2 comments

Comments

@katilp
Copy link
Member

katilp commented Apr 17, 2020

(from #7 - @slaurila )

Currenty, the final product of the analysis is a set of histograms that are interesting inspect qualitatively, but there is no way to conclude if we actually see some hint of Higgs in the data or not. After all selections, we have roughly 2500 signal events and 75000 background events, which gives a naive expected significance of 2500/sqrt(75000)=9.1 sigmas. This means that it is impossible to draw conclusions without including the systematic uncertainties (at least the dominating ones). However, it might be useful to provide a script that allows one to calculate the significance properly, and see how adding in syst. uncertainties for different processes (even if they are just ad-hoc numbers, some 10% here and 20% there) affects the significance. In practice, we would need something like this: http://dpnc.unige.ch/~sfyrla/teaching/Statistics/handsOn3.html

@lukasheinrich
Copy link

lukasheinrich commented Apr 17, 2020

we could use pyhf here for a systematics-aware computation of the significance given the histograms cc @matthewfeickert @kratsg

https://github.com/scikit-hep/pyhf

@kratsg
Copy link

kratsg commented Jul 25, 2020

I don't know if there's been any progress here, but we're willing to help. In pyhf, one can do pip install pyhf and then do the following procedure -- which is probably a little bit more tractable for newer students to get up and running.

We will need scikit-hep/pyhf#520 for the discovery test statistic (p0).

>>> import pyhf
>>> pdf = pyhf.simplemodels.hepdata_like([2500.], [75000.], [100.])
>>> pdf
<pyhf.pdf.Model object at 0x10fa75978>
>>> pdf.config.channels
['singlechannel']
>>> pdf.config.samples
['background', 'signal']
>>> pdf.config.parameters
['mu', 'uncorr_bkguncrt']
>>> pdf.config.modifiers
[('mu', 'normfactor'), ('uncorr_bkguncrt', 'shapesys')]
>>> pdf.config.npars
2
>>> pdf.expected_data([0.0, 1.0]) # bkg-only
array([ 75000., 562500.])
>>> pdf.expected_data([1.0, 1.0]) # signal + bkg
array([ 77500., 562500.])
>>> observations = [77500] + pdf.config.auxdata
>>> observations
[77500, 562500.0]
# NB: the following is for an exclusion test statistic.... 
>>> CLs, [CLsb, CLb], CLs_expected = pyhf.infer.hypotest(1.0, observations, pdf, return_tail_probs=True, return_expected_set=True)
>>> CLs
array([0.5])
>>> CLsb
array([0.5])
>>> CLb
array([1.])
>>> CLs_expected
array([[2.18887890e-24],
       [7.52567792e-21],
       [2.12854478e-17],
       [4.20192638e-14],
       [4.49330927e-11]])

All you can say from this is that one cannot exclude the mu=1.0 hypothesis [which I suppose is a fair statement!]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants