[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190918095811.GA25821@pc636>
Date: Wed, 18 Sep 2019 11:58:11 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: "Joel Fernandes (Google)" <joel@...lfernandes.org>
Cc: linux-kernel@...r.kernel.org, kernel-team@...roid.com,
kernel-team@....com, Byungchul Park <byungchul.park@....com>,
Davidlohr Bueso <dave@...olabs.net>,
Josh Triplett <josh@...htriplett.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
max.byungchul.park@...il.com,
"Paul E. McKenney" <paulmck@...ux.ibm.com>,
Rao Shoaib <rao.shoaib@...cle.com>, rcu@...r.kernel.org,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v4 1/2] rcu/tree: Add basic support for kfree_rcu()
batching
> Recently a discussion about stability and performance of a system
> involving a high rate of kfree_rcu() calls surfaced on the list [1]
> which led to another discussion how to prepare for this situation.
>
> This patch adds basic batching support for kfree_rcu(). It is "basic"
> because we do none of the slab management, dynamic allocation, code
> moving or any of the other things, some of which previous attempts did
> [2]. These fancier improvements can be follow-up patches and there are
> different ideas being discussed in those regards. This is an effort to
> start simple, and build up from there. In the future, an extension to
> use kfree_bulk and possibly per-slab batching could be done to further
> improve performance due to cache-locality and slab-specific bulk free
> optimizations. By using an array of pointers, the worker thread
> processing the work would need to read lesser data since it does not
> need to deal with large rcu_head(s) any longer.
>
> Torture tests follow in the next patch and show improvements of around
> 5x reduction in number of grace periods on a 16 CPU system. More
> details and test data are in that patch.
>
> There is an implication with rcu_barrier() with this patch. Since the
> kfree_rcu() calls can be batched, and may not be handed yet to the RCU
> machinery in fact, the monitor may not have even run yet to do the
> queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
> to wait for those kfree_rcu()s that are already made. So this means a
> kfree_rcu() followed by an rcu_barrier() does not imply that memory will
> be freed once rcu_barrier() returns.
>
> Another implication is higher active memory usage (although not
> run-away..) until the kfree_rcu() flooding ends, in comparison to
> without batching. More details about this are in the second patch which
> adds an rcuperf test.
>
> Finally, in the near future we will get rid of kfree_rcu() special casing
> within RCU such as in rcu_do_batch and switch everything to just
> batching. Currently we don't do that since timer subsystem is not yet up
> and we cannot schedule the kfree_rcu() monitor as the timer subsystem's
> lock are not initialized. That would also mean getting rid of
> kfree_call_rcu_nobatch() entirely.
>
Hello, Joel.
First of all thank you for improving it. I also noticed a high pressure
on RCU-machinery during performing some vmalloc tests when kfree_rcu()
flood occurred. Therefore i got rid of using kfree_rcu() there.
I have just a small question related to workloads and performance evaluation.
Are you aware of any specific workloads which benefit from it for example
mobile area, etc? I am asking because i think about backporting of it and
reuse it on our kernel.
Thank you!
--
Vlad Rezki
Powered by blists - more mailing lists