linux-kernel - Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220607034710.GE1790663@paulmck-ThinkPad-P17-Gen-1>
Date:   Mon, 6 Jun 2022 20:47:10 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <neeraj.iitr10@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>
Subject: Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN]
 interval

On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > Currently the monitor work is scheduled with a fixed interval that
> > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > a low utilization of page slot in some scenarios. The page can store
> > > up to 512 records. For example on Android system it can look like:
> > > 
> > > <snip>
> > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > <snip>
> > > 
> > > where a page only carries few records to reclaim a memory. In order to
> > > improve batching and make utilization more efficient the patch introduces
> > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > interval is set to its maximum timeout that improves the utilization of
> > > page slots.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@...il.com>
> > 
> > That does look like a problem well worth solving!
> >
> Agree, better ideas make better final solution :)
> 
> > 
> > But I am missing one thing. If we are having a callback flood, why do we
> > need a shorter timeout?
> >
> To offload faster, because otherwise we run into classical issue, it is a low
> memory condition state resulting in OOM.

But doesn't each callback queued during the flood give us an opportunity
to react to the flood?  That will be way more fine-grained than any
reasonable timer, right?  Or am I missing something?

I do agree that the action would often need to be indirect to avoid the
memory-allocation-state hassles, but we already can do that, either via
an extremely short-term hrtimer or something like irq-work.

> > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > and provide faster response to the start of a callback flood?
> >
> I rely on krcp->count because not always we can store the pointer in the page
> slots. We can not allocate a page in the caller context thus we use page-cache
> worker that fills the cache in normal context. While it populates the cache, 
> pointers temporary are queued to the linked-list.
> 
> Any thoughts?

There are a great many ways to approach this.  One of them is to maintain
a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
counter each jiffy.

Or am I missing a trick here?

							Thanx, Paul