[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250222184630.1f25865325eced9b0f37eb85@linux-foundation.org>
Date: Sat, 22 Feb 2025 18:46:30 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Thomas Prescher <thomas.prescher@...erus-technology.de>
Cc: "willy@...radead.org" <willy@...radead.org>, "linux-mm@...ck.org"
<linux-mm@...ck.org>, "corbet@....net" <corbet@....net>,
"muchun.song@...ux.dev" <muchun.song@...ux.dev>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm: hugetlb: add hugetlb_alloc_threads cmdline
option
On Fri, 21 Feb 2025 14:16:31 +0000 Thomas Prescher <thomas.prescher@...erus-technology.de> wrote:
> On Fri, 2025-02-21 at 13:52 +0000, Matthew Wilcox wrote:
> > I don't think we should add a command line option (ie blame the
> > sysadmin
> > for getting it wrong). Instead, we should figure out the right
> > number.
> > Is it half the number of threads per socket? A quarter? 90%? It's
> > bootup, the threads aren't really doing anything else. But we
> > should figure it out, not the sysadmin.
>
> I don't think we will find a number that delivers the best performance
> on every system out there. With the two systems we tested, we already
> see some differences.
>
> The Skylake servers have 36 threads per socket and deliver the best
> performance when we use 8 threads which is 22%. Using more threads
> decreases the performance.
>
> On Cascade Lake with 48 threads per socket, we see the best performance
> when using 32 threads which is 66%. Using more threads also decreases
> the performance here (not included in the table obove). The performance
> benefits of using more than 8 threads are very marginal though.
>
> I'm completely open to change the default so something that makes more
> sense. From the experiments we did so far, 25% of the threads per node
> deliver a reasonable good performance. We could still keep the
> parameter for sysadmins that want to micro-optimize the bootup time
> though.
I'm all for auto-tuning but yeah, for a boot-time thing like this we
require a boot-time knob.
As is often (always) the case, the sad thing is that about five people
in the world know that this exists. How can we tell our users that
this new thing is available and possibly useful to them? We have no
channel.
Perhaps in your [2/2] we could be noisier?
HugeTLB: allocation took 4242ms with hugepage_alloc_threads=42
and with a facility level higher than KERN_DEBUG (can/should we use
pr_foo() here, btw?). That should get people curious and poking around
in the documentation and experimenting.
Powered by blists - more mailing lists