linux-kernel - Re: [PATCH v7 01/11] rcu: Wake up nocb gp thread on rcu_barrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c634e41e-3c6c-0896-0873-b9d1bb317cea@joelfernandes.org>
Date:   Tue, 4 Oct 2022 18:57:59 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        rushikesh.s.kadam@...el.com, urezki@...il.com,
        neeraj.iitr10@...il.com, paulmck@...nel.org, rostedt@...dmis.org,
        youssefesmat@...gle.com, surenb@...gle.com
Subject: Re: [PATCH v7 01/11] rcu: Wake up nocb gp thread on
 rcu_barrier_entrain()

Hi Frederic,

On 10/4/2022 6:28 PM, Frederic Weisbecker wrote:
> On Tue, Oct 04, 2022 at 02:41:47AM +0000, Joel Fernandes (Google) wrote:
>> From: Frederic Weisbecker <frederic@...nel.org>
>>
>> In preparation of RCU lazy changes, wake up the RCU nocb gp thread if
> 
> It's more than just prep work for a new feature, it's a regression fix.

Oh ok, both our fixes are equivalent but I chose yours since its cleaner. I was
fixing Lazy CBs since I can actually trigger this issue.

>> needed after an entrain. Otherwise, the RCU barrier callback can wait in
>> the queue for several seconds before the lazy callbacks in front of it
>> are serviced.
> 
> It's not about lazy callbacks here (but you can mention the fact that
> waking nocb_gp if necessary after flushing bypass is a beneficial side
> effect for further lazy implementation).
> 
> So here is the possible bad scenario:
> 
> 1) CPU 0 is nocb, it queues a callback
> 2) CPU 0 goes idle (or userspace with nohz_full) forever
> 3) The grace period related to that callback elapses
> 4) The callback is moved to the done list (but is not invoked yet), there are no more pending for CPU 0
> 5) CPU 1 calls rcu_barrier() and entrains to CPU 0 cblist

CPU 1 can only entrain into CPU 0 if the CPU is offline:

		if (!rcu_rdp_cpu_online(rdp)) {
			rcu_barrier_entrain(rdp);
			WARN_ON_ONCE(READ_ONCE(rdp->barrier_seq_snap) != gseq);
			raw_spin_unlock_irqrestore(&rcu_state.barrier_lock,
			...
			continue;
		}

Otherwise an IPI does the entraining. So I do not see how CPU 0 being idle
causes the cross-CPU entraining.

> 6) CPU 1 waits forever

But, I agree it can still wait forever, once the IPI handler does the
entraining, since nothing will do the GP thread wakeup.

>>
>> Reported-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> 
> Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")

So, do you mind writing a proper patch with a proper commit message and Fixes
tag then? It can independent of this series and add my Reported-by tag, thanks!

Thanks!

 - Joel