[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8786c38-d6c4-4fea-a918-ac6a45682dba@amperemail.onmicrosoft.com>
Date: Fri, 19 Jan 2024 16:50:53 +0800
From: Shijie Huang <shijie@...eremail.onmicrosoft.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Yury Norov <yury.norov@...il.com>,
Huang Shijie <shijie@...amperecomputing.com>, gregkh@...uxfoundation.org,
patches@...erecomputing.com, rafael@...nel.org, paul.walmsley@...ive.com,
palmer@...belt.com, aou@...s.berkeley.edu, kuba@...nel.org,
vschneid@...hat.com, mingo@...nel.org, akpm@...ux-foundation.org,
vbabka@...e.cz, tglx@...utronix.de, jpoimboe@...nel.org,
ndesaulniers@...gle.com, mikelley@...rosoft.com, mhiramat@...nel.org,
arnd@...db.de, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
catalin.marinas@....com, will@...nel.org, mark.rutland@....com,
mpe@...erman.id.au, linuxppc-dev@...ts.ozlabs.org, chenhuacai@...nel.org,
jiaxun.yang@...goat.com, linux-mips@...r.kernel.org,
cl@...amperecomputing.com
Subject: Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the
correct node id
在 2024/1/19 16:42, Mike Rapoport 写道:
> On Fri, Jan 19, 2024 at 02:46:16PM +0800, Shijie Huang wrote:
>> 在 2024/1/19 12:42, Yury Norov 写道:
>>> This adds another level of indirection, I think. Currently cpu_to_node
>>> is a simple inliner. After the patch it would be a real function with
>>> all the associate overhead. Can you share a bloat-o-meter output here?
>> #./scripts/bloat-o-meter vmlinux vmlinux.new
>> add/remove: 6/1 grow/shrink: 61/51 up/down: 1168/-588 (580)
>> Function old new delta
>> numa_update_cpu 148 244 +96
>>
>> ...................................................................................................................................(to many to skip)
>>
>> Total: Before=32990130, After=32990710, chg +0.00%
>
> It's not only about text size, the indirect call also hurts performance
>
The cpu_to_node() is called at very low frequency, most of the times is
in the kernel booting time.
>>> Regardless, I don't think that the approach is correct. As per your
>>> description, some initialization functions erroneously call
>>> cpu_to_node() instead of early_cpu_to_node() which exists specifically
>>> for that case.
>>>
>>> If the above correct, it's clearly a caller problem, and the fix is to
>>> simply switch all those callers to use early version.
>> It is easy to change to early_cpu_to_node() for sched_init(),
>> init_sched_fair_class()
>>
>> and workqueue_init_early(). These three places call the cpu_to_node() in the
>> __init function.
>>
>>
>> But it is a little hard to change the early_trace_init(), since it calls
>> cpu_to_node in the deep
>>
>> function stack:
>>
>> early_trace_init() --> ring_buffer_alloc() -->rb_allocate_cpu_buffer()
>>
>>
>> For early_trace_init(), we need to change more code.
>>
>>
>> Anyway, If we think it is not a good idea to change the common code, I am
>> oaky too.
>
> Is there a fundamental reason to have early_cpu_to_node() at all?
The early_cpu_to_node does not work on some ARCHs (which support the
NUMA), such
as SPARC, MIPS and S390.
Thanks
Huang Shijie
> It seems that all the mappings are known by the end of setup_arch() and the
> initialization of numa_node can be moved earlier.
>
>>> I would also initialize the numa_node with NUMA_NO_NODE at declaration,
>>> so that if someone calls cpu_to_node() before the variable is properly
>>> initialized at runtime, he'll get NO_NODE, which is obviously an error.
>> Even we set the numa_node with NUMA_NO_NODE, it does not always produce
>> error.
>>
>> Please see the alloc_pages_node().
>>
>>
>> Thanks
>>
>> Huang Shijie
>>
Powered by blists - more mailing lists