coffea interface with pyhf - statistical inference #104

kratsg · 2019-06-09T23:34:48Z

Is your feature request related to a problem? Please describe.

Feature request.

Describe the solution you'd like

An interface/way to export to the pyhf JSONs to perform binned statistical fits (should work for anything that allows you to make histograms basically). The pyhf workspaces follow the schema (v1.0.0) defined here: https://diana-hep.org/pyhf/schemas/1.0.0/workspace.json. If you see this issue in the future, we might be at a later version of the schema.

Describe alternatives you've considered

None.

Additional context

A similar issue is on-going to get pyhf and zfit working together as pyhf is binned, but zfit is unbinned: zfit/zfit#120.

kratsg · 2019-06-09T23:37:19Z

There's a couple of ways one could approach this:

depend on pyhf for the exporting, and provide a coffea.stats.export('pyhf') which returns a pyhf.Model or pyhf.Workspace instance. This is more what numpy+scipy tends to do.
rely on the existence of the pyhf JSON schema specifications and dump the JSON (or python dictionary) that validates against the schema specification and leave it up to the user to run pyhf themselves.

lgray · 2019-07-09T00:08:02Z

@guitargeek So the discussion should sorta start here.
@kratsg - after chewing on this for a bit I am thinking on developing something that mimics the llvm compiler infrastructure model.

I am thinking along this direction because we are not going to convince anyone to use a specific stats tool but we can convince people to write things in a way that lets them use any stats tool. Especially if we make that way of describing a model such that you can turn it into any stats tool easy and/or expressive

Frontends based on some given flavor of histograms and parametric functions where you can write the model down in a clean way
The frontend description is processed into an intermediate representation that allows us to describe the the model to be fit in a way that's agnostic of the input histogram types, probably encoding in a fairly declarative way. I would not expect that we optimize anything here, so it's not a intermediate representation that needs to be as flexible as what you find in gcc or llvm, of course.
Then various backends can process the intermediate representation into a target statistical tool's expected inputs, producing all necessary files and descriptors.

I'm sort of assuming this is all in python so we have easy access to how functions are composed, so that we can straightforwardly, for instance, write something in PyROOT RooFit, take it apart and reassemble it into PyHF, zfitter, combine, sklearn or whatever we decide to make backends for.

While this may seem a little silly at first (why not just write your fit in a given stats tool after all), I think we can arrive at something with this where we have a highly portable description of a fit and when something new and cool gets made to fit things with we/someone just supply a new backend and people can happily fit away.

What do you think?

lgray · 2019-07-09T14:48:01Z

@jpivarski if you have anything to add here it'd be super useful for discussion as well!

jpivarski · 2019-07-09T14:56:40Z

We talked about this yesterday and I thought the "uniform interface to fitters" idea was a good one, particularly if you're targeting large projects like TensorFlow. If you're thinking specifically of HistFitter-style fits, then I'm beginning to think it would be better to defer to pyhf JSON, because that JSON format was intended to be implementation-independent, after all. Is the scope of what you're considering broader than the scope of what pyhf is already standardizing?

This could morph into a project of linking histogram-booking with pyhf models...

alexander-held · 2021-06-17T12:47:12Z

I came across this old issue and wanted to add a bit of information based on progress in the last two years.

project of linking histogram-booking with pyhf models

The cabinetry library does something like this. Users specify the relevant information needed to build a HistFactory model (the type that pyhf supports). cabinetry turns that information into instructions to create all the required template histograms. A prototype interface which allows carrying out those instructions with coffea exists. After all histograms are produced,cabinetry assembles them into a workspace following the pyhf JSON format.

provide a coffea.stats.export('pyhf') which returns a pyhf.Model

The challenge with that approach is that the information about how HistFactory channels-samples-systematics interact with each other is not known to coffea, and not necessarily required for standard usage. It brings up an interesting point, which is closely related to discussions in #469. When processing systematics, a coffea processor needs to know some related information: which types of detector systematics should be applied (typically affecting most/all samples in the same way), and which modeling systematics (often sample-dependent) need to be evaluated? To build the statistical model, this information needs to be provided. It can be hardcoded in the processor, or provided externally (that is the route cabinetry is taking).

lgray changed the title ~~[enhancement] coffea interface with pyhf - statistical inference~~ [ENHANCEMENT] coffea interface with pyhf - statistical inference Jun 9, 2019

lgray changed the title ~~[ENHANCEMENT] coffea interface with pyhf - statistical inference~~ [enhancement] coffea interface with pyhf - statistical inference Jun 9, 2019

lgray added the enhancement New feature or request label Jun 10, 2019

lgray changed the title ~~[enhancement] coffea interface with pyhf - statistical inference~~ coffea interface with pyhf - statistical inference Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coffea interface with pyhf - statistical inference #104

coffea interface with pyhf - statistical inference #104

kratsg commented Jun 9, 2019 •

edited

Loading

kratsg commented Jun 9, 2019

lgray commented Jul 9, 2019

lgray commented Jul 9, 2019

jpivarski commented Jul 9, 2019

alexander-held commented Jun 17, 2021

coffea interface with pyhf - statistical inference #104

coffea interface with pyhf - statistical inference #104

Comments

kratsg commented Jun 9, 2019 • edited Loading

kratsg commented Jun 9, 2019

lgray commented Jul 9, 2019

lgray commented Jul 9, 2019

jpivarski commented Jul 9, 2019

alexander-held commented Jun 17, 2021

kratsg commented Jun 9, 2019 •

edited

Loading