[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57eba42c-732a-4a30-a714-5e5538f2e5d5@linux.alibaba.com>
Date: Thu, 12 Oct 2023 21:30:50 +0800
From: Rongwei Wang <rongwei.wang@...ux.alibaba.com>
To: Pierre Gondois <pierre.gondois@....com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Cc: akpm@...ux-foundation.org, willy@...radead.org,
catalin.marinas@....com, dave.hansen@...ux.intel.com,
tj@...nel.org, mingo@...hat.com
Subject: Re: [PATCH RFC 0/5] support NUMA emulation for arm64
On 2023/10/12 20:37, Pierre Gondois wrote:
> Hello Rongwei,
>
> On 10/12/23 04:48, Rongwei Wang wrote:
>> A brief introduction
>> ====================
>>
>> The NUMA emulation can fake more node base on a single
>> node system, e.g.
>>
>> one node system:
>>
>> [root@...alhost ~]# numactl -H
>> available: 1 nodes (0)
>> node 0 cpus: 0 1 2 3 4 5 6 7
>> node 0 size: 31788 MB
>> node 0 free: 31446 MB
>> node distances:
>> node 0
>> 0: 10
>>
>> add numa=fake=2 (fake 2 node on each origin node):
>>
>> [root@...alhost ~]# numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 1 2 3 4 5 6 7
>> node 0 size: 15806 MB
>> node 0 free: 15451 MB
>> node 1 cpus: 0 1 2 3 4 5 6 7
>> node 1 size: 16029 MB
>> node 1 free: 15989 MB
>> node distances:
>> node 0 1
>> 0: 10 10
>> 1: 10 10
>>
>> As above shown, a new node has been faked. As cpus, the realization
>> of x86 NUMA emulation is kept. Maybe each node should has 4 cores is
>> better (not sure, next to do if so).
>>
>> Why do this
>> ===========
>>
>> It seems has following reasons:
>> (1) In x86 host, apply NUMA emulation can fake more nodes environment
>> to test or verify some performance stuff, but arm64 only has
>> one method that modify ACPI table to do this. It's troublesome
>> more or less.
>> (2) Reduce competition for some locks. Here an example we found:
>> will-it-scale/tlb_flush1_processes -t 96 -s 10, it shows obvious
>> hotspot on lruvec->lock when test in single environment. What's
>> more, The performance improved greatly if test in two more nodes
>> system. The data shows below (more is better):
>>
>> ---------------------------------------------------------------------
>> threads/process | 1 | 12 | 24 | 48 | 96
>> ---------------------------------------------------------------------
>> one node | 14 1122 | 110 5372 | 111 2615 | 79 7084 |
>> 72 4516
>> ---------------------------------------------------------------------
>> numa=fake=2 | 14 1168 | 144 4848 | 215 9070 | 157 0412 |
>> 142 3968
>> ---------------------------------------------------------------------
>> | For concurrency 12, no lruvec->lock hotspot.
>> For 24,
>> hotspot | one node has 24% hotspot on lruvec->lock, but
>> | two nodes env hasn't.
>> ---------------------------------------------------------------------
>>
>> As for risks (e.g. numa balance...), they need to be discussed here.
>>
>> Lastly, this just is a draft, I can improve next if it's acceptable.
>
> I'm not engaging on the utility/relevance of the patch-set, but I tried
> them on an arm64 system with the 'numa=fake=2' parameter and could not
Sorry, my fault.
I should mention this in previous brief introduction: acpi=on numa=fake=2.
The default patch of arm64 numa initialize is numa_init() ->
dummy_numa_init() if turn off acpi (this path has not been taken into
account yet in this patch, next will to do).
What's more, if you test these patchset in qemu-kvm, you should add
below parameters in the script.
object memory-backend-ram,id=mem0,size=32G \
numa node,memdev=mem0,cpus=0-7,nodeid=0 \
(Above parameters just make sure SRAT table has NUMA configure, avoiding
path of numa_init() -> dummy_numa_init())
> see 2 nodes being created under:
> /sys/devices/system/node/
> Indeed it seems that even though numa_emulation() is moved to a generic
> mm/numa.c file, the function is only called from:
> arch/x86/mm/numa.c:numa_init()
> (or maybe I'm misinterpreting the intent of the patches).
Here drivers/base/arch_numa.c:numa_init() has called numa_emulation() (I
guess it works if you add acpi=on :-)).
>
> Also I had the following errors when building (still for arm64):
> mm/numa.c:862:8: error: implicit declaration of function
> 'early_cpu_to_node' is invalid in C99
> [-Werror,-Wimplicit-function-declaration]
> nid = early_cpu_to_node(cpu);
It seems CONFIG_DEBUG_PER_CPU_MAPS enabled in your environment? You can
disable CONFIG_DEBUG_PER_CPU_MAPS and test it again.
I have not test it with CONFIG_DEBUG_PER_CPU_MAPS enabled. It's very
helpful, I will fix it next time.
If you have any questions, please let me know.
Regards,
-wrw
> ^
> mm/numa.c:862:8: note: did you mean 'early_map_cpu_to_node'?
> ./include/asm-generic/numa.h:37:13: note: 'early_map_cpu_to_node'
> declared here
> void __init early_map_cpu_to_node(unsigned int cpu, int nid);
> ^
> mm/numa.c:874:3: error: implicit declaration of function
> 'debug_cpumask_set_cpu' is invalid in C99
> [-Werror,-Wimplicit-function-declaration]
> debug_cpumask_set_cpu(cpu, nid, enable);
> ^
> mm/numa.c:874:3: note: did you mean '__cpumask_set_cpu'?
> ./include/linux/cpumask.h:474:29: note: '__cpumask_set_cpu' declared here
> static __always_inline void __cpumask_set_cpu(unsigned int cpu, struct
> cpumask *dstp)
> ^
> 2 errors generated.
>
> Regards,
> Pierre
>
>>
>> Thanks!
>>
>> Rongwei Wang (5):
>> mm/numa: move numa emulation APIs into generic files
>> mm: percpu: fix variable type of cpu
>> arch_numa: remove __init in early_cpu_to_node()
>> mm/numa: support CONFIG_NUMA_EMU for arm64
>> mm/numa: migrate leftover numa emulation into mm/numa.c
>>
>> arch/x86/Kconfig | 8 -
>> arch/x86/include/asm/numa.h | 3 -
>> arch/x86/mm/Makefile | 1 -
>> arch/x86/mm/numa.c | 216 +-------------
>> arch/x86/mm/numa_internal.h | 14 +-
>> drivers/base/arch_numa.c | 7 +-
>> include/asm-generic/numa.h | 33 +++
>> include/linux/percpu.h | 2 +-
>> mm/Kconfig | 8 +
>> mm/Makefile | 1 +
>> arch/x86/mm/numa_emulation.c => mm/numa.c | 333 +++++++++++++++++++++-
>> 11 files changed, 373 insertions(+), 253 deletions(-)
>> rename arch/x86/mm/numa_emulation.c => mm/numa.c (63%)
>>
Powered by blists - more mailing lists