linux-kernel - Re: [RFC-PATCH 1/2] mm: Add __GFP_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tux4kefm.fsf@nanos.tec.linutronix.de>
Date:   Sat, 15 Aug 2020 01:14:53 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     paulmck@...nel.org, Michal Hocko <mhocko@...e.com>
Cc:     Uladzislau Rezki <urezki@...il.com>,
        LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Matthew Wilcox <willy@...radead.org>,
        "Theodore Y . Ts'o" <tytso@....edu>,
        Joel Fernandes <joel@...lfernandes.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag

Paul,

On Fri, Aug 14 2020 at 11:01, Paul E. McKenney wrote:
> On Fri, Aug 14, 2020 at 04:06:04PM +0200, Michal Hocko wrote:
>> > > > Vlastimil raised same question earlier, i answered, but let me answer again:
>> > > > 
>> > > > It is hard to achieve because the logic does not stick to certain static test
>> > > > case, i.e. it depends on how heavily kfree_rcu(single/double) are used. Based
>> > > > on that, "how heavily" - number of pages are formed, until the drain/reclaimer
>> > > > thread frees them.
>> > > 
>> > > How many pages are talking about - ball park? 100s, 1000s?
>> > 
>> > Under normal operation, a couple of pages per CPU, which would make
>> > preallocation entirely reasonable.  Except that if someone does something
>> > that floods RCU callbacks (close(open) in a tight userspace loop, for but
>> > one example), then 2000 per CPU might not be enough, which on a 64-CPU
>> > system comes to about 500MB.  This is beyond excessive for preallocation
>> > on the systems I am familiar with.
>> > 
>> > And the flooding case is where you most want the reclamation to be
>> > efficient, and thus where you want the pages.

As we now established that taking zone lock is impossible at all
independent of raw/non-raw ordering and independent of RT/PREEMPT
configs, can we just take a step back and look at the problem from
scratch again?

As a matter of fact I assume^Wdeclare that removing struct rcu_head which
provides a fallback is not an option at all. I know that you want to,
but it wont work ever. Dream on, but as we agreed on recently there is
this thing called reality which ruins everything.

For normal operations a couple of pages which can be preallocated are
enough. What you are concerned of is the case where you run out of
pointer storage space.

There are two reasons why that can happen:

      1) RCU call flooding
      2) RCU not being able to run and mop up the backlog

#1 is observable by looking at the remaining storage space and the RCU
   call frequency

#2 is uninteresting because it's caused by RCU being stalled / delayed
   e.g. by a runaway of some sorts or a plain RCU usage bug.
   
   Allocating more memory in that case does not solve or improve anything.

So the interesting case is #1. Which means we need to look at the
potential sources of the flooding:

    1) User space via syscalls, e.g. open/close
    2) Kernel thread
    3) Softirq
    4) Device interrupt
    5) System interrupts, deep atomic context, NMI ...

#1 trivial fix is to force switching to an high prio thread or a soft
   interrupt which does the allocation

#2 Similar to #1 unless that thread loops with interrupts, softirqs or
   preemption disabled. If that's the case then running out of RCU
   storage space is the least of your worries.

#3 Similar to #2. The obvious candidates (e.g. NET) for monopolizing a
   CPU have loop limits in place already. If there is a bug which fails
   to care about the limit, why would RCU care and allocate more memory?

#4 Similar to #3. If the interrupt handler loops forever or if the
   interrupt is a runaway which prevents task/softirq processing then
   RCU free performance is the least of your worries.

#5 Clearly a bug and making RCU accomodate for that is beyond silly.

So if call_rcu() detects that the remaining storage space for pointers
goes below the critical point or if it observes high frequency calls
then it simply should force a soft interrupt which does the allocation.

Allocating from softirq context obviously without holding the raw lock
which is used inside call_rcu() is safe on all configurations.

If call_rcu() is forced to use the fallback for a few calls until this
happens then that's not the end of the world. It is not going to be a
problem ever for the most obvious issue #1, user space madness, because
that case cannot delay the softirq processing unless there is a kernel
bug which makes again RCU free performance irrelevant.

So this will cure the problem for the most interesting case #1 and
handle all sane variants of the other possible flooding sources.

The other insane reasons do not justify any attempt to increase RCU
performance at all.

Watching the remaining storage space is good enough IMO. It clearly
covers #1 and for all others the occasional fallback wont hurt. If it
really matters for any case > #1 then doing a frequency based prediction
is a straight forward optimization.

As usual I might be missing something, but as this is the second day
with reasonable temperatures here that would be caused by my ignorance
and not be excusable by brain usage outside of specified temperature
range.

Thanks,

        tglx