Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TMA 4.8 Release #181

Merged
merged 14 commits into from
May 21, 2024
Merged

TMA 4.8 Release #181

merged 14 commits into from
May 21, 2024

Conversation

calebbiggers
Copy link
Contributor

  • Updated TMA metrics to 4.8
  • Added client platform metrics

- Updated TMA metrics to 4.8
- Added client platform metrics
"MetricName": "tma_frontend_bound",
"ScaleUnit": "100%"
"ScaleUnit": "100%",
"Threshold": "tma_frontend_bound > 15"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the existing perf version:

    {
        "BriefDescription": "This category represents fraction of slots where the processor's Fronte
nd undersupplies its Backend",
        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / tma_info_thread_slots",
        "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
        "MetricName": "tma_frontend_bound",
        "MetricThreshold": "tma_frontend_bound > 0.15",
        "MetricgroupNoGroup": "TopdownL1",
        "PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
        "ScaleUnit": "100%"
    },
  • It looks like Threshold should be MetricThreshold.
  • slots rather than tma_info_slots
  • In the description Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS isn't present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the "Threshold" field to "MetricThreshold"

"MetricExpr": "( ( BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( ( UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group",
"MetricExpr": "( BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * tma_bad_speculation",
"MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;Slots",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the existing generated from a spreadsheet perf version the MetricGroup here is:

        "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM"

tma_issueBM in particular is missing. The issue groups come from the threshold column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the issues to the "MetricGroup" field

- Renamed "Threshold" field to "MetricThreshold"
- Added issues to "MetricGroup" field
- Edited incorrect event names
- Updated EMR metrics
"MetricName": "tma_dtlb_load",
"ScaleUnit": "100%",
"Threshold": "tma_dtlb_load > 10 && tma_l1_bound > 10 && tma_memory_bound > 20 && tma_backend_bound > 20"
"MetricThreshold": "tma_dtlb_load > 10 && tma_l1_bound > 10 && tma_memory_bound > 20 && tma_backend_bound > 20"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calebbiggers, current perf JSON uses & instead of &&.

@1perrytaylor 1perrytaylor requested a review from captain5050 May 15, 2024 16:48
"Threshold": "tma_frontend_bound > 15"
},
{
"BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing definition has "Sample with":

        "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",

https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json#n281

Which is pulled from the "Locate-with" column here:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L836

Could we add this to avoid a regression?

"Threshold": "tma_dsb_switches > 5 && tma_fetch_latency > 10 && tma_frontend_bound > 15"
},
{
"BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This existing description has "Related metrics":

        "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp"

https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json#n271

That has come from the issues in the threshold column:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L1321

Could we add this to avoid a regression?

@calebbiggers calebbiggers merged commit b7d8c00 into main May 21, 2024
7 checks passed
@calebbiggers calebbiggers deleted the TMA-4.8-Release branch May 21, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants