[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b815f9a6-e4da-8c8e-f207-71c6d122fc40@linux.ibm.com>
Date: Tue, 20 Apr 2021 18:34:34 +0200
From: Laurent Dufour <ldufour@...ux.ibm.com>
To: mpe@...erman.id.au, benh@...nel.crashing.org, paulus@...ba.org,
nathanl@...ux.ibm.com, Nick Piggin <npiggin@...il.com>
Cc: cheloha@...ux.ibm.com, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] pseries: prevent free CPU ids to be reused on another
node
Le 07/04/2021 à 17:38, Laurent Dufour a écrit :
> When a CPU is hot added, the CPU ids are taken from the available mask from
> the lower possible set. If that set of values was previously used for CPU
> attached to a different node, this seems to application like if these CPUs
> have migrated from a node to another one which is not expected in real
> life.
>
> To prevent this, it is needed to record the CPU ids used for each node and
> to not reuse them on another node. However, to prevent CPU hot plug to
> fail, in the case the CPU ids is starved on a node, the capability to reuse
> other nodes’ free CPU ids is kept. A warning is displayed in such a case
> to warn the user.
>
> A new CPU bit mask (node_recorded_ids_map) is introduced for each possible
> node. It is populated with the CPU onlined at boot time, and then when a
> CPU is hot plug to a node. The bits in that mask remain when the CPU is hot
> unplugged, to remind this CPU ids have been used for this node.
>
> The effect of this patch can be seen by removing and adding CPUs using the
> Qemu monitor. In the following case, the first CPU from the node 2 is
> removed, then the first one from the node 1 is removed too. Later, the
> first CPU of the node 2 is added back. Without that patch, the kernel will
> numbered these CPUs using the first CPU ids available which are the ones
> freed when removing the second CPU of the node 0. This leads to the CPU ids
> 16-23 to move from the node 1 to the node 2. With the patch applied, the
> CPU ids 32-39 are used since they are the lowest free ones which have not
> been used on another node.
>
> At boot time:
> [root@...0 ~]# numactl -H | grep cpus
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>
> Vanilla kernel, after the CPU hot unplug/plug operations:
> [root@...0 ~]# numactl -H | grep cpus
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 1 cpus: 24 25 26 27 28 29 30 31
> node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47
>
> Patched kernel, after the CPU hot unplug/plug operations:
> [root@...0 ~]# numactl -H | grep cpus
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 1 cpus: 24 25 26 27 28 29 30 31
> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>
> Changes since V3, addressing Nathan's comment:
> - Rename the local variable named 'nid' into 'assigned_node'
> Changes since V2, addressing Nathan's comments:
> - Remove the retry feature
> - Reduce the number of local variables (removing 'i')
> - Add comment about the cpu_add_remove_lock protecting the added CPU mask.
> Changes since V1 (no functional changes):
> - update the test's output in the commit's description
> - node_recorded_ids_map should be static
>
> Signed-off-by: Laurent Dufour <ldufour@...ux.ibm.com>
I did further LPM tests with this patch applied and not allowing fall back
reusing free ids of another node is too strong.
This is easy to hit that limitation when a LPAR is running at the maximum number
of CPU it is configured for and when a LPAR migration leads to new node activation.
For instance, consider a dedicated LPAR configured with a max of 32 CPUs (4
cores SMT 8). At boot time, cpu_possible_mask is filled with CPU ids 0-31 in
smp_setup_cpu_maps() by reading the DT property "/rtas/ibm,lrdr-capacity", so
the higher CPU id for this LPAR is 31.
Departure box:
node 0 : CPU 0-31
Arrival box:
node 0 : CPU 0-15
node 1 : CPU 16-31 << need to reuse ids from node 0
Visualizing the CPU ids would have a big impact as it is used in several places
in the kernel as to index linear table.
But in the case the LPAR is migratable (DT property "ibm,migratable-partition"
is present), we may set the higher CPU ids to NR_CPUS (usually 2048), to limit
the case where CPU id has to be reused on a different node. Doing this will have
impact on some data allocation done in the kernel when the size is based on
num_possible_cpus.
Any better idea?
Thanks,
Laurent.
Powered by blists - more mailing lists