linux-kernel - Re: [PATCH RFC] rcu/kfree: Do not request RCU when not needed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <B78EFE6F-D479-4F96-89CC-E19CA4146FF8@joelfernandes.org>
Date:   Thu, 3 Nov 2022 14:43:19 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     paulmck@...nel.org
Cc:     Uladzislau Rezki <urezki@...il.com>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] rcu/kfree: Do not request RCU when not needed



> On Nov 3, 2022, at 2:36 PM, Joel Fernandes <joel@...lfernandes.org> wrote:
> 
> 
> 
>> On Nov 3, 2022, at 1:51 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
>> 
>> On Thu, Nov 03, 2022 at 01:41:43PM +0100, Uladzislau Rezki wrote:
>>>>>>> /**
>>>>>>> @@ -3066,10 +3068,12 @@ static void kfree_rcu_work(struct work_struct *work)
>>>>>>>  struct kfree_rcu_cpu_work *krwp;
>>>>>>>  int i, j;
>>>>>>> 
>>>>>>> -    krwp = container_of(to_rcu_work(work),
>>>>>>> +    krwp = container_of(work,
>>>>>>>              struct kfree_rcu_cpu_work, rcu_work);
>>>>>>>  krcp = krwp->krcp;
>>>>>>> 
>>>>>>> +    cond_synchronize_rcu(krwp->gp_snap);
>>>>>> 
>>>>>> Might this provoke OOMs in case of callback flooding?
>>>>>> 
>>>>>> An alternative might be something like this:
>>>>>> 
>>>>>>  if (!poll_state_synchronize_rcu(krwp->gp_snap)) {
>>>>>>      queue_rcu_work(system_wq, &krwp->rcu_work);
>>>>>>      return;
>>>>>>  }
>>>>>> 
>>>>>> Either way gets you a non-lazy callback in the case where a grace
>>>>>> period has not yet elapsed.
>>>>>> Or am I missing something that prevents OOMs here?
>>>>> 
>>>>> The memory consumptions appears to be much less in his testing with the onslaught of kfree, which makes OOM probably less likely.
>>>>> 
>>>>> Though, was your reasoning that in case of a grace period not elapsing, we need a non lazy callback queued, so as to make the reclaim happen sooner?
>>>>> 
>>>>> If so, the cond_synchronize_rcu() should already be conditionally queueing non-lazy CB since we don’t make synchronous users wait for seconds. Or did I miss something?
>>>> 
>>>> My concern is that the synchronize_rcu() will block a kworker kthread
>>>> for some time, and that in callback-flood situations this might slow
>>>> things down due to exhausting the supply of kworkers.
>>>> 
>>> This concern works in both cases. I mean in default configuration and
>>> with a posted patch. The reclaim work, which name is kfree_rcu_work() only
>>> does a progress when a gp is passed so the rcu_work_rcufn() can queue
>>> our reclaim kworker.
>>> 
>>> As it is now:
>>> 
>>> 1. Collect pointers, then we decide to drop them we queue the
>>>  monitro_work() worker to the system_wq.
>>> 
>>> 2. The monitor work, kfree_rcu_work(), tries to attach or saying
>>> it by another words bypass a "backlog" to "free" channels.
>>> 
>>> 3. It invokes the queue_rcu_work() that does call_rcu_flush() and
>>> in its turn it queues our worker from the handler. So the worker
>>> is run after GP is passed.
>> 
>> So as it is now, we are not tying up a kworker kthread while waiting
>> for the grace period, correct?  We instead have an RCU callback queued
>> during that time, and the kworker kthread gets involved only after the
>> grace period ends.
>> 
>>> With a patch: 
>>> 
>>> [1] and [2] steps are the same. But on third step we do:
>>> 
>>> 1. Record the GP status for last in channel;
>>> 2. Directly queue the drain work without any call_rcu() helpers;
>>> 3. On the reclaim worker entry we check if GP is passed;
>>> 4. If not it invokes synchronize_rcu().
>> 
>> And #4 changes that, by (sometimes) tying up a kworker kthread for the
>> full grace period.
>> 
>>> The patch eliminates extra steps by not going via RCU-core route
>>> instead it directly invokes the reclaim worker where it either
>>> proceed or wait a GP if needed.
>> 
>> I agree that the use of the polled API could be reducing delays, which
>> is a good thing.  Just being my usual greedy self and asking "Why not
>> both?", that is use queue_rcu_work() instead of synchronize_rcu() in
>> conjunction with the polled APIs so as to avoid both the grace-period
>> delay and the tying up of the kworker kthread.
>> 
>> Or am I missing something here?
> 
> Yeah I am with Paul on this, NAK on “blocking in kworker” instead of “checking for grace period + queuing either regular work or RCU work”. Note that blocking also adds a pointless and fully avoidable scheduler round trip.

As a side note, it’s notable how nicely this work evolved over the years thanks to Vlad and all of y’all’s work. For instance, flooding pages with kfree pointers and grace period polling was not something even invented back when kfree_rcu was a simple wrapper. Now it soon will be actually freeing memory faster, by avoiding waiting on RCU when not needed! And of course this is all happening probably because we wanted RCU to be lazy in nocb is a nice side effect of that effort ;-)

 - Joel


> 
> - Joel
> 
> 
>> 
>>                           Thanx, Paul