lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 19 Jan 2024 10:02:13 -0800
From: Yury Norov <yury.norov@...il.com>
To: Shijie Huang <shijie@...eremail.onmicrosoft.com>
Cc: Mike Rapoport <rppt@...nel.org>,
	Huang Shijie <shijie@...amperecomputing.com>,
	gregkh@...uxfoundation.org, patches@...erecomputing.com,
	rafael@...nel.org, paul.walmsley@...ive.com, palmer@...belt.com,
	aou@...s.berkeley.edu, kuba@...nel.org, vschneid@...hat.com,
	mingo@...nel.org, akpm@...ux-foundation.org, vbabka@...e.cz,
	tglx@...utronix.de, jpoimboe@...nel.org, ndesaulniers@...gle.com,
	mikelley@...rosoft.com, mhiramat@...nel.org, arnd@...db.de,
	linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
	linux-arm-kernel@...ts.infradead.org, catalin.marinas@....com,
	will@...nel.org, mark.rutland@....com, mpe@...erman.id.au,
	linuxppc-dev@...ts.ozlabs.org, chenhuacai@...nel.org,
	jiaxun.yang@...goat.com, linux-mips@...r.kernel.org,
	cl@...amperecomputing.com
Subject: Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of
 the correct node id

On Fri, Jan 19, 2024 at 04:50:53PM +0800, Shijie Huang wrote:
> 
> 在 2024/1/19 16:42, Mike Rapoport 写道:
> > On Fri, Jan 19, 2024 at 02:46:16PM +0800, Shijie Huang wrote:
> > > 在 2024/1/19 12:42, Yury Norov 写道:
> > > > This adds another level of indirection, I think. Currently cpu_to_node
> > > > is a simple inliner. After the patch it would be a real function with
> > > > all the associate overhead. Can you share a bloat-o-meter output here?
> > > #./scripts/bloat-o-meter vmlinux vmlinux.new
> > > add/remove: 6/1 grow/shrink: 61/51 up/down: 1168/-588 (580)
> > > Function                                     old     new   delta
> > > numa_update_cpu                              148     244     +96
> > > 
> > >   ...................................................................................................................................(to many to skip)
> > > 
> > > Total: Before=32990130, After=32990710, chg +0.00%
> > It's not only about text size, the indirect call also hurts performance
> 
> The cpu_to_node() is called at very low frequency, most of the times is in
> the kernel booting time.
 
That doesn't matter. This function is a simple inliner that dereferences
a pointer, and I believe all of us want to keep it simple. 
 
> > > > Regardless, I don't think that the approach is correct. As per your
> > > > description, some initialization functions erroneously call
> > > > cpu_to_node() instead of early_cpu_to_node() which exists specifically
> > > > for that case.
> > > > 
> > > > If the above correct, it's clearly a caller problem, and the fix is to
> > > > simply switch all those callers to use early version.
> > > It is easy to change to early_cpu_to_node() for sched_init(),
> > > init_sched_fair_class()
> > > 
> > > and workqueue_init_early(). These three places call the cpu_to_node() in the
> > > __init function.
> > > 
> > > 
> > > But it is a little hard to change the early_trace_init(), since it calls
> > > cpu_to_node in the deep
> > > 
> > > function stack:
> > > 
> > >    early_trace_init() --> ring_buffer_alloc() -->rb_allocate_cpu_buffer()
> > > 
> > > 
> > > For early_trace_init(), we need to change more code.
> > > 
> > > 
> > > Anyway, If we think it is not a good idea to change the common code, I am
> > > oaky too.
> > Is there a fundamental reason to have early_cpu_to_node() at all?
> 
> The early_cpu_to_node does not work on some ARCHs (which support the NUMA),
> such
> 
> as  SPARC, MIPS and S390.

So, your approach wouldn't work either, right? I think you've got a
testing bot report on it already...

You can make it like this:

  #ifdef CONFIG_ARCH_NO_EARLY_CPU_TO_NODE
  #define early_cpu_to_node cpu_to_node
  #endif
 
> > It seems that all the mappings are known by the end of setup_arch() and the
> > initialization of numa_node can be moved earlier.
> > > > I would also initialize the numa_node with NUMA_NO_NODE at declaration,
> > > > so that if someone calls cpu_to_node() before the variable is properly
> > > > initialized at runtime, he'll get NO_NODE, which is obviously an error.
> > > Even we set the numa_node with NUMA_NO_NODE, it does not always produce
> > > error.

You can print this error yourself:

  #ifndef cpu_to_node
  static inline int cpu_to_node(int cpu)
  {
        int node = per_cpu(numa_node, cpu);

  #ifdef CONFIG_DEBUG_PER_CPU_MAPS
        if (node == NUMA_NO_NODE)
                pr_err(...);
  #endif

          return node;
  }
  #endif


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ