linux-kernel - Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+KHdyXU1apQQbVVZ_zX7o86Qp-AQnif2u8aJFULqHNtfCGcSA@mail.gmail.com>
Date:   Thu, 9 Jun 2022 15:10:57 +0200
From:   Uladzislau Rezki <urezki@...il.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <neeraj.iitr10@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>
Subject: Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval

On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > Currently the monitor work is scheduled with a fixed interval that
> > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > a low utilization of page slot in some scenarios. The page can store
> > > > up to 512 records. For example on Android system it can look like:
> > > >
> > > > <snip>
> > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > <snip>
> > > >
> > > > where a page only carries few records to reclaim a memory. In order to
> > > > improve batching and make utilization more efficient the patch introduces
> > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > interval is set to its maximum timeout that improves the utilization of
> > > > page slots.
> > > >
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@...il.com>
> > >
> > > That does look like a problem well worth solving!
> > >
> > Agree, better ideas make better final solution :)
> >
> > >
> > > But I am missing one thing. If we are having a callback flood, why do we
> > > need a shorter timeout?
> > >
> > To offload faster, because otherwise we run into classical issue, it is a low
> > memory condition state resulting in OOM.
>
> But doesn't each callback queued during the flood give us an opportunity
> to react to the flood?  That will be way more fine-grained than any
> reasonable timer, right?  Or am I missing something?
>
We can set the timer to zero or to current "jiffies" to initiate the
offloading if the
page is full. In that sense probably it make sense to propagate those two attr.
to user space, so the user can configure min/max drain interval.

Or we can only deal with fixed interval exposed via sysfs to control it by user.
In that case we can get rid of MIN one and just trigger a timer if the page is
full. I think this approach is better.

>
> I do agree that the action would often need to be indirect to avoid the
> memory-allocation-state hassles, but we already can do that, either via
> an extremely short-term hrtimer or something like irq-work.
>
> > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > and provide faster response to the start of a callback flood?
> > >
> > I rely on krcp->count because not always we can store the pointer in the page
> > slots. We can not allocate a page in the caller context thus we use page-cache
> > worker that fills the cache in normal context. While it populates the cache,
> > pointers temporary are queued to the linked-list.
> >
> > Any thoughts?
>
> There are a great many ways to approach this.  One of them is to maintain
> a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> counter each jiffy.
>
> Or am I missing a trick here?
>
Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
and schedule the work when if it is needed? Or i have missed your point?

-- 
Uladzislau Rezki