[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04ed0569-5026-9c4f-b09f-3e8798d5b551@huawei.com>
Date: Mon, 19 Aug 2024 15:03:21 +0800
From: Yicong Yang <yangyicong@...wei.com>
To: Dietmar Eggemann <dietmar.eggemann@....com>
CC: <catalin.marinas@....com>, <will@...nel.org>, <sudeep.holla@....com>,
<tglx@...utronix.de>, <peterz@...radead.org>, <mpe@...erman.id.au>,
<linux-arm-kernel@...ts.infradead.org>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>, <yangyicong@...ilicon.com>,
<linuxppc-dev@...ts.ozlabs.org>, <x86@...nel.org>,
<linux-kernel@...r.kernel.org>, <gregkh@...uxfoundation.org>,
<rafael@...nel.org>, <jonathan.cameron@...wei.com>,
<prime.zeng@...ilicon.com>, <linuxarm@...wei.com>, <xuwei5@...wei.com>,
<guohanjun@...wei.com>
Subject: Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based
system
On 2024/8/16 23:55, Dietmar Eggemann wrote:
> On 06/08/2024 10:53, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@...ilicon.com>
>>
>> For ACPI we'll build the topology from PPTT and we cannot directly
>> get the SMT number of each core. Instead using a temporary xarray
>> to record the SMT number of each core when building the topology
>> and we can know the largest SMT number in the system. Then we can
>> enable the support of SMT control.
>>
>> Signed-off-by: Yicong Yang <yangyicong@...ilicon.com>
>> ---
>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>> 1 file changed, 24 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -15,8 +15,10 @@
>> #include <linux/arch_topology.h>
>> #include <linux/cacheinfo.h>
>> #include <linux/cpufreq.h>
>> +#include <linux/cpu_smt.h>
>> #include <linux/init.h>
>> #include <linux/percpu.h>
>> +#include <linux/xarray.h>
>>
>> #include <asm/cpu.h>
>> #include <asm/cputype.h>
>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>> */
>> int __init parse_acpi_topology(void)
>> {
>> + int thread_num, max_smt_thread_num = 1;
>> + struct xarray core_threads;
>> int cpu, topology_id;
>> + void *entry;
>>
>> if (acpi_disabled)
>> return 0;
>>
>> + xa_init(&core_threads);
>> +
>> for_each_possible_cpu(cpu) {
>> topology_id = find_acpi_cpu_topology(cpu, 0);
>> if (topology_id < 0)
>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].thread_id = topology_id;
>> topology_id = find_acpi_cpu_topology(cpu, 1);
>> cpu_topology[cpu].core_id = topology_id;
>> +
>> + entry = xa_load(&core_threads, topology_id);
>> + if (!entry) {
>> + xa_store(&core_threads, topology_id,
>> + xa_mk_value(1), GFP_KERNEL);
>> + } else {
>> + thread_num = xa_to_value(entry);
>> + thread_num++;
>> + xa_store(&core_threads, topology_id,
>> + xa_mk_value(thread_num), GFP_KERNEL);
>> +
>> + if (thread_num > max_smt_thread_num)
>> + max_smt_thread_num = thread_num;
>> + }
>
> So the xarray contains one element for each core_id with the information
> how often the core_id occurs? I assume you have to iterate over all
> possible CPUs since you don't know which logical CPUs belong to the same
> core_id.
>
Each xarray element counts the thread number of a certain core id. so the logic is like below:
1. if the "core id" entry doesn't exists, then we're accessing this core for the 1st time. create
one and make the thread number to 1
2. otherwise increment the thread number of "core id" this cpu belongs (PPTT already
told us which core this CPU belongs to). Update the max_smt_thread_num if necessary.
Then we can know max_smt_thread_num by meanwhile iterating the PPTT table and
build the topology for all the possible CPUs.
Otherwise we need to do a second scan for the max thread number after built the
topology. This way is implemented in v1 and it's complained about the overhead on large
scale systems since we need to loop the CPUs twice.
>> } else {
>> cpu_topology[cpu].thread_id = -1;
>> cpu_topology[cpu].core_id = topology_id;
>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].package_id = topology_id;
>> }
>>
>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> +
>> + xa_destroy(&core_threads);
>> return 0;
>> }
>> #endif
>
> Tested on ThunderX2:
>
> $ cat /proc/schedstat | head -6 | tail -4 | awk '{ print $1, $2 }'
> cpu0 0
> domain0 00000000,00000000,00000000,00000000,00000001,00000001,00000001,00000001
> ^ ^ ^ ^
> domain1 00000000,00000000,00000000,00000000,ffffffff,ffffffff,ffffffff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>
> detecting 'max_smt_thread_num = 4' correctly.
>
Thanks for the testing. ok for a tag?
Thanks.
Powered by blists - more mailing lists