linux-kernel - Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20171220051918.GK7829@linux.vnet.ibm.com>
Date:   Tue, 19 Dec 2017 21:19:18 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Jesper Dangaard Brouer <brouer@...hat.com>, rao.shoaib@...cle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface
 for freeing rcu structures

On Tue, Dec 19, 2017 at 05:53:36PM -0800, Matthew Wilcox wrote:
> On Tue, Dec 19, 2017 at 04:20:51PM -0800, Paul E. McKenney wrote:
> > If we are going to make this sort of change, we should do so in a way
> > that allows the slab code to actually do the optimizations that might
> > make this sort of thing worthwhile.  After all, if the main goal was small
> > code size, the best approach is to drop kfree_bulk() and get on with life
> > in the usual fashion.
> > 
> > I would prefer to believe that something like kfree_bulk() can help,
> > and if that is the case, we should give it a chance to do things like
> > group kfree_rcu() requests by destination slab and soforth, allowing
> > batching optimizations that might provide more significant increases
> > in performance.  Furthermore, having this in slab opens the door to
> > slab taking emergency action when memory is low.
> 
> kfree_bulk does sort by destination slab; look at build_detached_freelist.

Understood, but beside the point.  I suspect that giving it larger
scope makes it more efficient, similar to disk drives in the old days.
Grouping on the stack when processing RCU callbacks limits what can
reasonably be done.  Furthermore, using the vector approach going into the
grace period is much more cache-efficient than the linked-list approach,
given that the blocks have a reasonable chance of going cache-cold during
the grace period.

And the slab-related operations should really be in the slab code in any
case rather than within RCU.

							Thanx, Paul