linux-kernel - Re: [RFC-PATCH 1/2] mm: Add __GFP_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200818150232.GQ28270@dhcp22.suse.cz>
Date:   Tue, 18 Aug 2020 17:02:32 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Uladzislau Rezki <urezki@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Matthew Wilcox <willy@...radead.org>,
        "Theodore Y . Ts'o" <tytso@....edu>,
        Joel Fernandes <joel@...lfernandes.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>
Subject: Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag

On Tue 18-08-20 06:53:27, Paul E. McKenney wrote:
> On Tue, Aug 18, 2020 at 09:43:44AM +0200, Michal Hocko wrote:
> > On Mon 17-08-20 15:28:03, Paul E. McKenney wrote:
> > > On Mon, Aug 17, 2020 at 10:28:49AM +0200, Michal Hocko wrote:
> > > > On Mon 17-08-20 00:56:55, Uladzislau Rezki wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > wget ftp://vps418301.ovh.net/incoming/1000000_kmalloc_kfree_rcu_proc_percpu_pagelist_fractio_is_8.png
> > > > 
> > > > 1/8 of the memory in pcp lists is quite large and likely not something
> > > > used very often.
> > > > 
> > > > Both these numbers just make me think that a dedicated pool of page
> > > > pre-allocated for RCU specifically might be a better solution. I still
> > > > haven't read through that branch of the email thread though so there
> > > > might be some pretty convincing argments to not do that.
> > > 
> > > To avoid the problematic corner cases, we would need way more dedicated
> > > memory than is reasonable, as in well over one hundred pages per CPU.
> > > Sure, we could choose a smaller number, but then we are failing to defend
> > > against flooding, even on systems that have more than enough free memory
> > > to be able to do so.  It would be better to live within what is available,
> > > taking the performance/robustness hit only if there isn't enough.
> > 
> > Thomas had a good point that it doesn't really make much sense to
> > optimize for flooders because that just makes them more effective.
> 
> The point is not to make the flooders go faster, but rather for the
> system to be robust in the face of flooders.  Robust as in harder for
> a flooder to OOM the system.

Do we see this to be a practical problem? I am really confused because
the initial argument was revolving around an optimization now you are
suggesting that this is actually system stability measure. And I fail to
see how allowing an easy way to deplete pcp caches completely solves
any of that. Please do realize that if allow that then every user who
relies on pcp caches will have to take a slow(er) path and that will
have performance consequences. The pool is a global and a scarce
resource. That's why I've suggested a dedicated preallocated pool and
use it instead of draining global pcp caches.

> And reducing the number of post-grace-period cache misses makes it
> easier for the callback-invocation-time memory freeing to keep up with
> the flooder, thus avoiding (or at least delaying) the OOM.
> 
> > > My current belief is that we need a combination of (1) either the
> > > GFP_NOLOCK flag or Peter Zijlstra's patch and
> > 
> > I must have missed the patch?
> 
> If I am keeping track, this one:
> 
> https://lore.kernel.org/lkml/20200814215206.GL3982@worktop.programming.kicks-ass.net/

OK, I have certainly noticed that one but didn't react but my response
would be similar to the dedicated gfp flag. This is less of a hack than
__GFP_NOLOCK  but it still exposes very internal parts of the allocator
and I find that a quite problematic from the future maintenance of the
allocator. The risk of an easy depletion of the pcp pool is also there
of course.
-- 
Michal Hocko
SUSE Labs