[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YQQii5061aNCvdCtvoBZra+5PxFWJcDCO0JWjSM3PURTQ@mail.gmail.com>
Date: Wed, 22 Jan 2025 10:04:08 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: paulmck@...nel.org
Cc: Uladzislau Rezki <urezki@...il.com>, Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>, RCU <rcu@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>, David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>, Roman Gushchin <roman.gushchin@...ux.dev>,
Hyeonggon Yoo <42.hyeyoo@...il.com>, Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>
Subject: Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
On Tue, Jan 21, 2025 at 3:32 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
> > On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> > > On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> > > > On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> > > >> On 12/16/24 17:46, Paul E. McKenney wrote:
> > > >>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> > > >>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> > > >>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
> > > >>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> > > >>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
> > > >>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
> > > >>>>>>>>> kvfree_call_rcu()?
> > > >>>>>>>>>
> > > >>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
> > > >>>>>>>> implementation.
> > > >>>>>>>
> > > >>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> > > >>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
> > > >>>>>>>
> > > >>>>>> Yes, it is for a really small systems with low amount of memory. I see
> > > >>>>>> only one overhead it is about driving objects in pages. For a small
> > > >>>>>> system it can be critical because we allocate.
> > > >>>>>>
> > > >>>>>> From the other hand, for a tiny variant we can modify the normal variant
> > > >>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
> > > >>>>>> i.e. merge it to a normal kvfree_rcu() path.
> > > >>>>>
> > > >>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> > > >>>>> case (less memory usage on low memory system, tradeoff for worse performance).
> > > >>>>>
> > > >>>> Yep, i also was thinking about that without saying it :)
> > > >>>
> > > >>> Works for me as well!
> > > >>
> > > >> Hi, so I tried looking at this. First I just moved the code to slab as seen
> > > >> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> > > >> not a show-stopper here.
> > > >>
> > > >> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> > > >> to control whether we use the full blown batching implementation or the
> > > >> simple call_rcu() implmentation, and realized it's not straightforward and
> > > >> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> > > >> internals :)
> > > >>
> > > >> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> > > >>
> > > >> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> > > >> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> > > >> there are other facilities the batching implementation needs that only
> > > >> exists in the TREE_RCU implementation
> > > >>
> > > >> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> > > >> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> > > >> memory systems are fine with the simple implementation.
> > > >>
> > > >> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> > > >>
> > > >> AFAICS I can't just make the simple implementation do call_rcu() on
> > > >> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> > > >> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> > > >> but no such equivalent exists in TREE_RCU. Am I right?
> > > >>
> > > >> Possible solution: teach TREE_RCU callback invocation to handle
> > > >> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> > > >> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> > > >> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> > > >> but RCU has to special case it still.
> > > >>
> > > >> Possible solution 2: instead of the special offset handling, SLUB provides a
> > > >> callback function, which will determine pointer to the object from the
> > > >> pointer to a middle of it without knowing the rcu_head offset.
> > > >> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> > > >> performant anyway so we might not care.
> > > >> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> > > >>
> > > >> Thoughts?
> > > >>
> > > > For the call_rcu() and to be able to reclaim over it we need to patch the
> > > > tree.c(please note TINY already works):
> > > >
> > > > <snip>
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b1f883fcd918..ab24229dfa73 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > > > debug_rcu_head_unqueue(rhp);
> > > >
> > > > rcu_lock_acquire(&rcu_callback_map);
> > > > - trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > >
> > > > f = rhp->func;
> > > > - debug_rcu_head_callback(rhp);
> > > > - WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > > - f(rhp);
> > > >
> > > > + if (__is_kvfree_rcu_offset((unsigned long) f)) {
> > > > + trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> > > > + kvfree((void *) rhp - (unsigned long) f);
> > > > + } else {
> > > > + trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > > + debug_rcu_head_callback(rhp);
> > > > + WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > > + f(rhp);
> > > > + }
> > > > rcu_lock_release(&rcu_callback_map);
> > >
> > > Right so that's the first Possible solution, but without the #ifdef. So
> > > there's an overhead of checking __is_kvfree_rcu_offset() even if the
> > > batching is done in slab and this function is never called with an offset.
> > >
> > Or fulfilling a missing functionality? TREE is broken in that sense
> > whereas a TINY handles it without any issues.
> >
> > It can be called for SLUB_TINY option, just call_rcu() instead of
> > batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
>
> Would this make sense?
>
> if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
>
> Just to be repetitive, other alternatives include:
>
> 1. Take advantage of SLOB being no longer with us.
>
> 2. Get rid of Tiny RCU's special casing of kfree_rcu(), and then
> eliminate the above "if" statement in favor of its "else" clause.
>
> 3. Make Tiny RCU implement a trivial version of kfree_rcu() that
> passes a list through RCU.
>
> I don't have strong feelings, and am happy to defer to your guys'
> decision.
If I may chime in with an opinion, I think the cleanest approach would
be to not special-case the func pointer and instead provide a callback
from the SLAB layer which does the kfree. Then get rid of
__is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
the overhead of function calling, but I highly doubt that it is going
to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
is a kfree slow-path. I feel in the long run, this will also be more
maintainable.
Or is there a reason other than the theoretical function call overhead
why this may not work?
thanks,
- Joel
Powered by blists - more mailing lists