[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1dd52bb0-f905-4eff-8044-9e99e689ab06@amperemail.onmicrosoft.com>
Date: Thu, 25 Jan 2024 17:15:29 +0800
From: Shijie Huang <shijie@...eremail.onmicrosoft.com>
To: Mike Rapoport <rppt@...nel.org>,
"Lameter, Christopher" <cl@...amperecomputing.com>
Cc: Huang Shijie <shijie@...amperecomputing.com>, gregkh@...uxfoundation.org,
patches@...erecomputing.com, rafael@...nel.org, paul.walmsley@...ive.com,
palmer@...belt.com, aou@...s.berkeley.edu, yury.norov@...il.com,
kuba@...nel.org, vschneid@...hat.com, mingo@...nel.org,
akpm@...ux-foundation.org, vbabka@...e.cz, tglx@...utronix.de,
jpoimboe@...nel.org, ndesaulniers@...gle.com, mikelley@...rosoft.com,
mhiramat@...nel.org, arnd@...db.de, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
catalin.marinas@....com, will@...nel.org, mark.rutland@....com,
mpe@...erman.id.au, linuxppc-dev@...ts.ozlabs.org, chenhuacai@...nel.org,
jiaxun.yang@...goat.com, linux-mips@...r.kernel.org
Subject: Re: [PATCH v2] NUMA: Early use of cpu_to_node() returns 0 instead of
the correct node id
在 2024/1/25 15:31, Mike Rapoport 写道:
> On Wed, Jan 24, 2024 at 09:19:00AM -0800, Lameter, Christopher wrote:
>> On Tue, 23 Jan 2024, Huang Shijie wrote:
>>
>>> During the kernel booting, the generic cpu_to_node() is called too early in
>>> arm64, powerpc and riscv when CONFIG_NUMA is enabled.
>>>
>>> For arm64/powerpc/riscv, there are at least four places in the common code
>>> where the generic cpu_to_node() is called before it is initialized:
>>> 1.) early_trace_init() in kernel/trace/trace.c
>>> 2.) sched_init() in kernel/sched/core.c
>>> 3.) init_sched_fair_class() in kernel/sched/fair.c
>>> 4.) workqueue_init_early() in kernel/workqueue.c
>>>
>>> In order to fix the bug, the patch changes generic cpu_to_node to
>>> function pointer, and export it for kernel modules.
>>> Introduce smp_prepare_boot_cpu_start() to wrap the original
>>> smp_prepare_boot_cpu(), and set cpu_to_node with early_cpu_to_node.
>>> Introduce smp_prepare_cpus_done() to wrap the original smp_prepare_cpus(),
>>> and set the cpu_to_node to formal _cpu_to_node().
>> Would you please fix this cleanly without a function pointer?
>>
>> What I think needs to be done is a patch series.
>>
>> 1. Instrument cpu_to_node so that some warning is issued if it is used too
>> early. Preloading the array with NUMA_NO_NODE would allow us to do that.
>>
>> 2. Implement early_cpu_to_node on platforms that currently do not have it.
>>
>> 3. A series of patches that fix each place where cpu_to_node is used too
>> early.
For step 3, I find it it hard to change the cpu_to_node() to
early_cpu_to_node() for early_trace_init().
In early_trace_init(), the __ring_buffer_alloc() calls the cpu_to_node().
In order to fix the bug, we should use early_cpu_to_node() for
__ring_buffer_alloc().
But __ring_buffer_alloc() is also used by the kernel after the booting
finished.
After the booting finishes, we should use the cpu_to_node(), not the
early_cpu_to_node().
> I think step 3 can be simplified with a generic function that sets
> per_cpu(numa_node) using early_cpu_to_node(). It can be called right after
> setup_per_cpu_areas().
I think this method maybe better..
I will try this too.
Thanks
Huang Shijie
Powered by blists - more mailing lists