-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TMA 4.8 Release #181
TMA 4.8 Release #181
Conversation
calebbiggers
commented
May 10, 2024
- Updated TMA metrics to 4.8
- Added client platform metrics
- Updated TMA metrics to 4.8 - Added client platform metrics
"MetricName": "tma_frontend_bound", | ||
"ScaleUnit": "100%" | ||
"ScaleUnit": "100%", | ||
"Threshold": "tma_frontend_bound > 15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the existing perf version:
{
"BriefDescription": "This category represents fraction of slots where the processor's Fronte
nd undersupplies its Backend",
"MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / tma_info_thread_slots",
"MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group",
"MetricName": "tma_frontend_bound",
"MetricThreshold": "tma_frontend_bound > 0.15",
"MetricgroupNoGroup": "TopdownL1",
"PublicDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Pipeline_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound. Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS",
"ScaleUnit": "100%"
},
- It looks like Threshold should be MetricThreshold.
slots
rather thantma_info_slots
- In the description
Sample with: FRONTEND_RETIRED.LATENCY_GE_4_PS
isn't present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed the "Threshold" field to "MetricThreshold"
"MetricExpr": "( ( BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( ( UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) ) ) )", | ||
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group", | ||
"MetricExpr": "( BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * tma_bad_speculation", | ||
"MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;Slots", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the existing generated from a spreadsheet perf version the MetricGroup here is:
"MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM"
tma_issueBM
in particular is missing. The issue groups come from the threshold column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the issues to the "MetricGroup" field
- Renamed "Threshold" field to "MetricThreshold" - Added issues to "MetricGroup" field - Edited incorrect event names - Updated EMR metrics
"MetricName": "tma_dtlb_load", | ||
"ScaleUnit": "100%", | ||
"Threshold": "tma_dtlb_load > 10 && tma_l1_bound > 10 && tma_memory_bound > 20 && tma_backend_bound > 20" | ||
"MetricThreshold": "tma_dtlb_load > 10 && tma_l1_bound > 10 && tma_memory_bound > 20 && tma_backend_bound > 20" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@calebbiggers, current perf JSON uses &
instead of &&
.
"Threshold": "tma_frontend_bound > 15" | ||
}, | ||
{ | ||
"BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing definition has "Sample with":
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period. Sample with: RS_EVENTS.EMPTY_END",
Which is pulled from the "Locate-with" column here:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L836
Could we add this to avoid a regression?
"Threshold": "tma_dsb_switches > 5 && tma_fetch_latency > 10 && tma_frontend_bound > 15" | ||
}, | ||
{ | ||
"BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This existing description has "Related metrics":
"PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Related metrics: tma_dsb_switches, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp"
That has come from the issues in the threshold column:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L1321
Could we add this to avoid a regression?