Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppc64_cpu: Fix handling of non-contiguous CPU IDs #104

Open
wants to merge 2 commits into
base: next
Choose a base branch
from

Conversation

AboorvaDevarajan
Copy link

@AboorvaDevarajan AboorvaDevarajan commented Jan 13, 2025

In ppc64le environments, adding or removing CPUs dynamically through
DLPAR can create gaps in CPU IDs, such as `0-103,120-151`, in this
case CPUs 104-119 are missing.

ppc64_cpu doesn't handles this scenario and always considers CPU IDs
to be contiguous causing issues in core numbering, cpu info and SMT
mode reporting.

To illustrate the issues this patch fixes, consider the following
system configuration:

$ lscpu
Architecture:             ppc64le
Byte Order:               Little Endian
CPU(s):                   136
On-line CPU(s) list:      0-103,120-151

**Note: CPU IDs are non-contiguous**

-----------------------------------------------------------------
Before Patch:
-----------------------------------------------------------------

$ ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4*    5*    6*    7*
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*
Core   2:   16*   17*   18*   19*   20*   21*   22*   23*
Core   3:   24*   25*   26*   27*   28*   29*   30*   31*
Core   4:   32*   33*   34*   35*   36*   37*   38*   39*
Core   5:   40*   41*   42*   43*   44*   45*   46*   47*
Core   6:   48*   49*   50*   51*   52*   53*   54*   55*
Core   7:   56*   57*   58*   59*   60*   61*   62*   63*
Core   8:   64*   65*   66*   67*   68*   69*   70*   71*
Core   9:   72*   73*   74*   75*   76*   77*   78*   79*
Core  10:   80*   81*   82*   83*   84*   85*   86*   87*
Core  11:   88*   89*   90*   91*   92*   93*   94*   95*
Core  12:   96*   97*   98*   99*  100*  101*  102*  103*
........................................................... *gap*
Core  13:  120*  121*  122*  123*  124*  125*  126*  127*
Core  14:  128*  129*  130*  131*  132*  133*  134*  135*
Core  15:  136*  137*  138*  139*  140*  141*  142*  143*
Core  16:  144*  145*  146*  147*  148*  149*  150*  151*

**Although the CPU IDs are non contiguous, associated core IDs are
represented in contiguous order, which makes it harder to interpret
this clearly.**

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --cores-on
Number of cores online = 15

**Expected: Number of online cores = 17**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --offline-cores
Cores offline = 13, 14

**Even though no cores are actually offline, two cores (13, 14)
are displayed as offline.**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --online-cores
Cores online = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16

**The list of online cores is missing two cores (13, 14).**
-----------------------------------------------------------------

To resolve this, use the present CPU list from sysfs to assign
numbers to CPUs and cores, which will make this accurate.

$ cat /sys/devices/system/cpu/present
0-103,120-151

With this patch, the command output correctly reflects the
current CPU configuration, providing a more precise representation
of the system state.

-----------------------------------------------------------------
After Patch:
-----------------------------------------------------------------

$ ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4*    5*    6*    7*
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*
Core   2:   16*   17*   18*   19*   20*   21*   22*   23*
Core   3:   24*   25*   26*   27*   28*   29*   30*   31*
Core   4:   32*   33*   34*   35*   36*   37*   38*   39*
Core   5:   40*   41*   42*   43*   44*   45*   46*   47*
Core   6:   48*   49*   50*   51*   52*   53*   54*   55*
Core   7:   56*   57*   58*   59*   60*   61*   62*   63*
Core   8:   64*   65*   66*   67*   68*   69*   70*   71*
Core   9:   72*   73*   74*   75*   76*   77*   78*   79*
Core  10:   80*   81*   82*   83*   84*   85*   86*   87*
Core  11:   88*   89*   90*   91*   92*   93*   94*   95*
Core  12:   96*   97*   98*   99*  100*  101*  102*  103*
........................................................... *gap*
Core  15:  120*  121*  122*  123*  124*  125*  126*  127*
Core  16:  128*  129*  130*  131*  132*  133*  134*  135*
Core  17:  136*  137*  138*  139*  140*  141*  142*  143*
Core  18:  144*  145*  146*  147*  148*  149*  150*  151*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --cores-on
Number of cores online = 17

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --offline-cores
Cores offline =

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --online-cores
Cores online = 0,1,2,3,4,5,6,7,8,9,10,11,12,15,16,17,18

-----------------------------------------------------------------

Signed-off-by: Aboorva Devarajan <[email protected]>

Introduce get_present_core_list helper function to accurately parse
and retrieve the list of present CPU cores, addressing gaps in core
numbering caused by dynamic addition or removal of CPUs (via CPU DLPAR
operation)

Utilizes the present CPU list from `sys/devices/system/cpu/present`
to handle non-contiguous CPU IDs. Accurately maps core IDs to CPUs
considering specified number of threads per CPU, addressing gaps in
core numbering.

Signed-off-by: Aboorva Devarajan <[email protected]>
In ppc64le environments, adding or removing CPUs dynamically through
DLPAR can create gaps in CPU IDs, such as `0-103,120-151`, in this
case CPUs 104-119 are missing.

ppc64_cpu doesn't handles this scenario and always considers CPU IDs
to be contiguous causing issues in core numbering, cpu info and SMT
mode reporting.

To illustrate the issues this patch fixes, consider the following
system configuration:

$ lscpu
Architecture:             ppc64le
Byte Order:               Little Endian
CPU(s):                   136
On-line CPU(s) list:      0-103,120-151

**Note: CPU IDs are non-contiguous**

-----------------------------------------------------------------
Before Patch:
-----------------------------------------------------------------

$ ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4*    5*    6*    7*
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*
Core   2:   16*   17*   18*   19*   20*   21*   22*   23*
Core   3:   24*   25*   26*   27*   28*   29*   30*   31*
Core   4:   32*   33*   34*   35*   36*   37*   38*   39*
Core   5:   40*   41*   42*   43*   44*   45*   46*   47*
Core   6:   48*   49*   50*   51*   52*   53*   54*   55*
Core   7:   56*   57*   58*   59*   60*   61*   62*   63*
Core   8:   64*   65*   66*   67*   68*   69*   70*   71*
Core   9:   72*   73*   74*   75*   76*   77*   78*   79*
Core  10:   80*   81*   82*   83*   84*   85*   86*   87*
Core  11:   88*   89*   90*   91*   92*   93*   94*   95*
Core  12:   96*   97*   98*   99*  100*  101*  102*  103*
........................................................... *gap*
Core  13:  120*  121*  122*  123*  124*  125*  126*  127*
Core  14:  128*  129*  130*  131*  132*  133*  134*  135*
Core  15:  136*  137*  138*  139*  140*  141*  142*  143*
Core  16:  144*  145*  146*  147*  148*  149*  150*  151*

**Although the CPU IDs are non contiguous, associated core IDs are
represented in contiguous order, which makes it harder to interpret
this clearly.**

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --cores-on
Number of cores online = 15

**Expected: Number of online cores = 17**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --offline-cores
Cores offline = 13, 14

**Even though no cores are actually offline, two cores (13, 14)
are displayed as offline.**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --online-cores
Cores online = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16

**The list of online cores is missing two cores (13, 14).**
-----------------------------------------------------------------

To resolve this, use the present CPU list from sysfs to assign
numbers to CPUs and cores, which will make this accurate.

$ cat /sys/devices/system/cpu/present
0-103,120-151

With this patch, the command output correctly reflects the
current CPU configuration, providing a more precise representation
of the system state.

-----------------------------------------------------------------
After Patch:
-----------------------------------------------------------------

$ ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4*    5*    6*    7*
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*
Core   2:   16*   17*   18*   19*   20*   21*   22*   23*
Core   3:   24*   25*   26*   27*   28*   29*   30*   31*
Core   4:   32*   33*   34*   35*   36*   37*   38*   39*
Core   5:   40*   41*   42*   43*   44*   45*   46*   47*
Core   6:   48*   49*   50*   51*   52*   53*   54*   55*
Core   7:   56*   57*   58*   59*   60*   61*   62*   63*
Core   8:   64*   65*   66*   67*   68*   69*   70*   71*
Core   9:   72*   73*   74*   75*   76*   77*   78*   79*
Core  10:   80*   81*   82*   83*   84*   85*   86*   87*
Core  11:   88*   89*   90*   91*   92*   93*   94*   95*
Core  12:   96*   97*   98*   99*  100*  101*  102*  103*
........................................................... *gap*
Core  15:  120*  121*  122*  123*  124*  125*  126*  127*
Core  16:  128*  129*  130*  131*  132*  133*  134*  135*
Core  17:  136*  137*  138*  139*  140*  141*  142*  143*
Core  18:  144*  145*  146*  147*  148*  149*  150*  151*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --cores-on
Number of cores online = 17

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --offline-cores
Cores offline =

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ppc64_cpu --online-cores
Cores online = 0,1,2,3,4,5,6,7,8,9,10,11,12,15,16,17,18

-----------------------------------------------------------------

Signed-off-by: Aboorva Devarajan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant