linux-kernel - Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEXW_YT6bD=sAD77yg+BXC+Q0NW0RHCg6fEYZ_jsBeXkAd_1tg@mail.gmail.com>
Date: Tue, 4 Feb 2025 20:28:29 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: paulmck@...nel.org
Cc: linux-kernel@...r.kernel.org, Frederic Weisbecker <frederic@...nel.org>, 
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>, Josh Triplett <josh@...htriplett.org>, 
	Boqun Feng <boqun.feng@...il.com>, Uladzislau Rezki <urezki@...il.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>, rcu@...r.kernel.org
Subject: Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done()

On Tue, Feb 4, 2025 at 3:22 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Tue, Feb 04, 2025 at 10:44:45AM -0500, Joel Fernandes wrote:
> > On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> > > By "where we were initially", one might reasonably guess that you meant
> > > that the initial value and the target are one and the same.  But I suspect
> > > that you are thinking of a third value, the value of the underlying
> > > grace-period sequence number at the time of the call to rcu_seq_snap().
> > > But you might be thinking of something else entirely.
> > >
> > > > Now we move rcu_seq_done_exact. It has the exact same behavior except
> > > > that instead of ULONG_MAX/2, the above situations are shrunk to about
> > > > 10 counts from the target. So if target is 28, then the initial
> > > > sequence should have been at least 18 to avoid false-positive, but
> > > > again it is possible and I certain see in the code that it cannot be
> > > > used this way.. (at least so far).. So all we are left with is the
> > > > false-negative band of ~2.5 GPs..
> > >
> > > Here, I have the same questions.  As you no doubt guessed.
> > >
> > > > About "what are the consequences of failing to get this right".  I
> > > > believe false-positive is unacceptable unless it is maybe debug code -
> > > > that can break functionality in code, too short GPs and all that.
> > > > However false-negative is OK as long as the usecase can retry later
> > > > and can afford to wait. Did I get that much correct?
> > >
> > > Maybe?
> > >
> > > Please look at this on a per-use-case basis.
> >
> > Sure. FWIW, I started a world-editable document here. I am planning to
> > work on this a bit more before our meeting this week. If others
> > knowledgeable like Frederic and others could make edits/comments,
> > that'd be welcomed. But I basically summarized this whole thread here:
> > https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing
> >
> > My thought is (AFAICS), this patch is still valid and
> > rcu_seq_done_exact is a potentially better replacement to
> > rcu_seq_done. But famous last words...
>
> Let's start with the use case where the sequence number is being used
> by rcu_barrier().  What happens?

The rcu_state.barrier_sequence is used to keep track of barrier in
progress, as there can be concurrent rcu_barrier() calls happening so
we optimize by batching multiples of these calls.

It also handles the case where the entrain IPI came through but no
rcu_barrier() is currently in-flight (which would be weird but maybe
it happens somehow).

It is also used to make sure we don't need to entrain on a CPU again
if we already did an entraining for a certain rcu_barrier() invocation
already. This is tracked by rdp->barrier_seq_snap which copies the
value of rcu_state.barrier_sequence after the entraining. Al though I
am currently unable to envision why the IPI handler would be called
unnecessarily (hardware bug?) since we hold the rcu_barrier mutex
throughout the rcu_barrier() function, including while waiting on
CPUs.

Going back to my first point, for the rcu_seq_done() in rcu_barrier(),
we are checking if someone else did our work:
    mutex_lock(&rcu_state.barrier_mutex);
    /* Did someone else do our work for us? */
    if (rcu_seq_done(&rcu_state.barrier_sequence, s))

Here I think rcu_seq_done_exact() is better, because if too many
rcu_barrier() calls concurrently made at the same time, the mutex may
be contended. If enough time passes and we have a wrap around, we
might end up in false-negative land which is much larger with
rcu_seq_done() than ..done_exact(). However, that does seem to be a
bug to begin with if the mutex contended for that long. OTOH, doing an
rcu_barrier() again when we didn't need to doesn't seem like the end
of the world especially if we contended on the mutex for that long.
But at least it shows that rcu_seq_done_exact() cannot seem to hurt
here.

Hmmm,

 - Joel