[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230508084115.1157-1-kprateek.nayak@amd.com>
Date: Mon, 8 May 2023 14:11:13 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: <linux-kernel@...r.kernel.org>
CC: <gregkh@...uxfoundation.org>, <rafael@...nel.org>,
<sudeep.holla@....com>, <yongxuan.wang@...ive.com>,
<pierre.gondois@....com>, <vincent.chen@...ive.com>,
<greentime.hu@...ive.com>, <yangyicong@...wei.com>,
<prime.zeng@...ilicon.com>, <palmer@...osinc.com>, <puwen@...on.cn>
Subject: [PATCH 0/2] drivers: base: cacheinfo: Fix shared_cpu_list inconsistency in event of CPU hotplug
Since v6.3-rc1, the shared_cpu_list in per-cpu cacheinfo breaks in case
of hotplug activity on x86. This can be tracked back to two commits:
o commit 198102c9103f ("cacheinfo: Fix shared_cpu_map to handle shared
caches at different levels") that matches cache instance IDs without
considering if the instance IDs belong to same cache level or not.
o commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported through
sysfs") which skips calling populate_cache_leaves() if
last_level_cache_is_valid(cpu) returns true. populate_cache_leaves()
on x86 would have populated the shared_cpu_map when CPU comes online,
which is now skipped, and the alternate path has an early bailout
before setting the CPU in the shared_cpu_map is even attempted.
On x86, populate_cache_leaves() also sets the
cpu_cacheinfo->cpu_map_populated flag when the cacheinfo is first
populated, the cache_shared_cpu_map_setup() in the driver is bypassed
when a thread comes back online during the hotplug activity. This leads
to the shared_cpu_list displaying abnormal values for the CPU that was
offlined and then onlined since the shared_cpu_maps are never
revaluated.
Following is the output from a dual socket 3rd Generation AMD EPYC
processor (2 x 64C/128T) for cachinfo when offlining and then onlining
CPU8:
o v6.3-rc5 with no changes:
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# echo 0 > /sys/devices/system/cpu/cpu8/online
# echo 1 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8
# cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list
136
# cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list
9-15,136-143
o v6.3-rc5 with commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported
through sysfs") reverted (Behavior consistent with v6.2):
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# echo 0 > /sys/devices/system/cpu/cpu8/online
# echo 1 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list
8,136
# cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list
8-15,136-143
This is not only limited to AMD processors but affects Intel processors
too. Following is the output from same experiment on a dual socket Intel
Ice Lake server (2 x 32C/64T) running kernel v6.3-rc5:
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,
26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,
88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126
# echo 0 > /sys/devices/system/cpu/cpu8/online
# echo 1 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8
# cat /sys/devices/system/cpu/cpu72/cache/index0/shared_cpu_list
72
# cat /sys/devices/system/cpu/cpu72/cache/index3/shared_cpu_list
0,2,4,6,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,
66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,
120,122,124,126
This patch addresses two issues associated with building
shared_cpu_list:
o Patch 1 fixes an ID matching issue that can lead to cacheinfo
associating CPUs from different cache levels in case IDs are not
unique across all the different cache levels.
o Patch 2 clears the cpu_cacheinfo->cpu_map_populated flag when CPU goes
offline and is removed from the shared_cpu_map.
Following are the results after applying the series on v6.3-rc5 on
respective x86 platforms:
o 3rd Generation AMD EPYC Processor (2 x 64C/128T)
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# echo 0 > /sys/devices/system/cpu/cpu8/online
# echo 1 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list
8,136
# cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list
8-15,136-143
o Intel Ice Lake Xeon (2 x 32C/128T)
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,
28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,
92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126
# echo 0 > /sys/devices/system/cpu/cpu8/online
# echo 1 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,72
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,
28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,
92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126
# cat /sys/devices/system/cpu/cpu72/cache/index0/shared_cpu_list
8,72
# cat /sys/devices/system/cpu/cpu72/cache/index3/shared_cpu_list
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,
68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,
124,126
Running "grep -r 'cpu_map_populated' arch/" shows MIPS and loongarch too
set the cpu_cacheinfo->cpu_map_populated who might also be affected by
the changes in commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported
through sysfs") and this series. Changes from Patch 1 might also affect
RISC-V since Yong-Xuan Wang <yongxuan.wang@...ive.com> from SiFive last
made changes to cache_shared_cpu_map_setup() and
cache_shared_cpu_map_remove() in commit 198102c9103f ("cacheinfo: Fix
shared_cpu_map to handle shared caches at different levels").
---
K Prateek Nayak (2):
drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU
hotplug
drivers: base: cacheinfo: Clear cpu_map_populated when CPU goes
offline
drivers/base/cacheinfo.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
--
2.34.1
Powered by blists - more mailing lists