lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBng0qnm/ADtSTBQ@bombadil.infradead.org>
Date:   Tue, 21 Mar 2023 09:52:34 -0700
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     David Hildenbrand <david@...hat.com>
Cc:     Adam Manzanares <a.manzanares@...sung.com>,
        linux-modules@...r.kernel.org, linux-kernel@...r.kernel.org,
        pmladek@...e.com, petr.pavlu@...e.com, prarit@...hat.com,
        christophe.leroy@...roup.eu, song@...nel.org,
        torvalds@...ux-foundation.org
Subject: Re: [RFC 00/12] module: avoid userspace pressure on unwanted
 allocations

On Tue, Mar 21, 2023 at 04:11:27PM +0100, David Hildenbrand wrote:
> On 20.03.23 22:23, Luis Chamberlain wrote:
> > On Mon, Mar 20, 2023 at 10:15:23PM +0100, David Hildenbrand wrote:
> > > On 20.03.23 22:09, Luis Chamberlain wrote:
> > > > On Mon, Mar 20, 2023 at 08:40:07PM +0100, David Hildenbrand wrote:
> > > > > On 20.03.23 10:38, David Hildenbrand wrote:
> > > > > > On 18.03.23 01:11, Luis Chamberlain wrote:
> > > > > > > On Thu, Mar 16, 2023 at 04:56:56PM -0700, Luis Chamberlain wrote:
> > > > > > > > On Thu, Mar 16, 2023 at 04:55:31PM -0700, Luis Chamberlain wrote:
> > > > > > > > > On Wed, Mar 15, 2023 at 05:41:53PM +0100, David Hildenbrand wrote:
> > > > > > > > > > I expect to have a machine (with a crazy number of CPUs/devices) available
> > > > > > > > > > in a couple of days (1-2), so no need to rush.
> > > > > > > > > > 
> > > > > > > > > > The original machine I was able to reproduce with is blocked for a little
> > > > > > > > > > bit longer; so I hope the alternative I looked up will similarly trigger the
> > > > > > > > > > issue easily.
> > > > > > > > > 
> > > > > > > > > OK give this a spin:
> > > > > > > > > 
> > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230316-module-alloc-opts
> > > > > > > 
> > > > > > > Today I am up to here:
> > > > > > > 
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230317-module-alloc-opts
> > > > > > > 
> > > > > > > The last patch really would have no justification yet at all unless it
> > > > > > > does help your case.
> > > > > > 
> > > > > > Still waiting on the system (the replacement system I was able to grab
> > > > > > broke ...).
> > > > > > 
> > > > > > I'll let you know once I succeeded in reproducing + testing your fixes.
> > > > > 
> > > > > Okay, I have a system where I can reproduce.
> > > > > 
> > > > > Should I give
> > > > > 
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230319-module-alloc-opts
> > > > > 
> > > > > from yesterday a churn?
> > > > 
> > > > Yes please give that a run.
> > > 
> > > Reproduced with v6.3.0-rc1 (on 1st try)
> > 
> > By reproduced, you mean it fails to boot?
> 
> It boots but we get vmap allocation warnings, because the ~440 CPUs manage
> to completely exhaust the module vmap area due to KASAN.

Thanks, can you post a trace?

> > > Not able to reproduce with 20230319-module-alloc-opts so far (2 tries).
> > 
> > Oh wow, so to clarify, it boots OK?
> 
> It boots and I don't get the vmap allocation warnings.

Wonderful!

Now to zero if all commits are required, and measure small value-add 
for each of them. That is the hard part and thanks for any time you
can help dedicate towards that.

There's really only 3 functional changes we need to measure:

In the 20230319-module-alloc-opts-adjust branch they are left towards
the end, in history, the top being the most recent commit:

600f1f769d06 module: add a sanity check prior to allowing kernel module auto-loading
680f2c1fff2d module: use list_add_tail_rcu() when adding module
6023a6c7d98c module: avoid allocation if module is already present and ready

My guess is the majority of the fix comes from the last commit, so
commit 6023a6c7d98c ("module: avoid allocation if module is already
present and ready").

The rcu one probably doesn't help much but I think it makes sense while
we're at it.

The last one is the one I'm less convinced makes sense. But *if* it does
help, it means such finit_module() situations / stresses are much larger
than we ever anticipated.

> > > > Please collect systemd-analyze given lack of any other tool to evaluate
> > > > any deltas. Can't think of anything else to gather other than seeing if
> > > > it booted.
> > > 
> > > Issue is that some services (kdump, tuned) seem to take sometimes ages on
> > > that system to start for some reason,
> > 
> > How about disabling that?
> 
> It seems to be random services. On my debug kernel with KASAN everything is
> just super slow. I'll try to measure on a !debug kernel.

Got it thanks!

  Luis

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ