linux-kernel - Re: [PATCH 0/4] kvfree_rcu() and _LOCK_NESTING/_PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200930155229.GA1474760@google.com>
Date:   Wed, 30 Sep 2020 11:52:29 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Uladzislau Rezki (Sony)" <urezki@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Theodore Y . Ts'o" <tytso@....edu>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>
Subject: Re: [PATCH 0/4] kvfree_rcu() and _LOCK_NESTING/_PREEMPT_RT

On Fri, Sep 18, 2020 at 09:48:13PM +0200, Uladzislau Rezki (Sony) wrote:
> Hello, folk!
> 
> This is another iteration of fixing kvfree_rcu() issues related
> to CONFIG_PROVE_RAW_LOCK_NESTING and CONFIG_PREEMPT_RT configs.
> 
> The first discussion is here https://lkml.org/lkml/2020/8/9/195.
> 
> - As an outcome of it, there was a proposal from Peter, instead of
> using a speciall "lock-less" flag it is better to move lock-less
> access to the pcplist to the separate function.
> 
> - To add a special worker thread that does prefetching of pages
> if a per-cpu page cache is depleted(what is absolutely normal). 
> 
> As usual, thank you for paying attention to it and your help!

Doesn't making it a lower priority WQ exacerbate the problem Mel described?

So like:
1. pcp cache is depleted by kvfree_rcu without refill or other measures to
   relieve memory.
2. now other GFP_ATOMIC users could likely hit the emergency reserves in the
   buddy allocator as the watermarks are crossed.
3. kvfree_rcu() notices failure and queues workqueue to do non-preemptible
   buddy allocations which will refill the pcp cache in the process.
4. But that happens much later because this patch (4/4) down prioritized the
   work to do the refill.

I'd suggest keeping it high pri since I don't see how it can make things
better.

Or another option is:
Why not just hit the fallback path in the caller on the first attempt, and
trigger the WQ to do the allocation. If the pool grows too big, we can have
shrinkers that free memory that is excessive so that will help the phone
usecases. That way no changes to low-level allocator are needed.

Or did I miss something?

thanks,

 - Joel

> 
> Uladzislau Rezki (Sony) (4):
>   rcu/tree: Add a work to allocate pages from regular context
>   mm: Add __rcu_alloc_page_lockless() func.
>   rcu/tree: use __rcu_alloc_page_lockless() func.
>   rcu/tree: Use schedule_delayed_work() instead of WQ_HIGHPRI queue
> 
>  include/linux/gfp.h |  1 +
>  kernel/rcu/tree.c   | 90 ++++++++++++++++++++++++---------------------
>  mm/page_alloc.c     | 82 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 132 insertions(+), 41 deletions(-)
> 
> -- 
> 2.20.1
>