linux-kernel - Re: [PATCH RFC v2 rcu] Fix get_state_synchronize_rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEXW_YQaC+TYnXZca2MBcJQgf2EWD+pBp-Dih+b647RtpJ+LNQ@mail.gmail.com>
Date: Sun, 26 Jan 2025 20:22:23 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: paulmck@...nel.org
Cc: Frederic Weisbecker <frederic@...nel.org>, rcu@...r.kernel.org, linux-kernel@...r.kernel.org, 
	kernel-team@...a.com, rostedt@...dmis.org
Subject: Re: [PATCH RFC v2 rcu] Fix get_state_synchronize_rcu_full() GP-start detection

On Sun, Jan 26, 2025 at 8:13 PM Joel Fernandes <joel@...lfernandes.org> wrote:
>
> Hi, Paul and Frederic,
>
> [...]
> > > > On Sat, Jan 25, 2025 at 12:03:58AM +0100, Frederic Weisbecker wrote:
> > > > > Le Fri, Dec 13, 2024 at 11:49:49AM -0800, Paul E. McKenney a écrit :
> > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > > index 2f9c9272cd486..d2a91f705a4ab 100644
> > > > > > --- a/kernel/rcu/rcu.h
> > > > > > +++ b/kernel/rcu/rcu.h
> > > > > > @@ -162,7 +162,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
> > > > > >  {
> > > > > >         unsigned long cur_s = READ_ONCE(*sp);
> > > > > >
> > > > > > -       return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));
> > > > > > +       return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1));
> > > > >
> [...]
> > > > > The way I understand it is that rcu_state.gp_seq might be seen started while
> > > > > root_rnp->gp_seq is not. So rcu_seq_snap() on the started rcu_state.gp_seq
> > > > > may return maximum 2 full GPs ahead of root_rnp->gp_seq. And therefore it takes below
> > > > > 2 GPs to safely deduce we wrapped around.
> > > >
> > > > Exactly!
> > > >
> > > > > Should it be ULONG_CMP_LT(cur_s, s - (2 * (RCU_SEQ_STATE_MASK + 1))) ?
> > > >
> > > > Quite possibly.  I freely admit that I allowed a bit of slop because
> > > > time was of the essence (holidays and all that) and also it does not
> > > > hurt much to lose a couple of counts out of a 2^32 cycle, to say nothing
> > > > of the common-case 2^64 cycle.  It would not hurt to be exact, but it
> > > > would be necessary to convince ourselves that we were not off by one in
> > > > the wrong direction.
> > > >
> > > > I would be happy to see a patch, as long as it was sufficiently
> > > > convincing.
> > >
> > > I'm not so much concerned about being exact but rather about making
> > > sure we still understand what we did within one year. We can leave one
> > > more grace period than what we expect out of paranoia but, the most
> > > important is that we comment about what we expect and why. Let me
> > > prepare a patch for that.
> >
> > Even better!
>
> Do we really have users who could pass such a huge delta wrapped
> around value to poll() i.e > ULONG_MAX/2 ?  For 32-bit, that would be
> 2 billion count since the get() (500 million GPs on 32-bit?). I am
> curious if such a scenario should be a WARN() because also: If more
> than ULONG_MAX/2 values are possible after get(), then a full or
> multiple ULONG_MAX wraparound is also possible. This means then both
> rcu_seq_done() and rcu_seq_done_exact() become unreliable anyway for
> such stale get() values.
>
> I do agree with both your points on the side effect of the patch to
> rcu_seq_done_exact(), but I am not fully convinced myself about
> utility of rcu_seq_done_exact() versus the rcu_seq_done() but I could
> be missing something.

I want to modify my comment on reliability. rcu_seq_done_exact() is
certainly more "reliable" than rcu_seq_done() for wraparound delta  >
ULONG_MAX/2. Still with such a huge wrap around it is not fail proof
if it lands within the "3 Grace period" window, so if it is not super
reliable and if the user should not rely on it, then I wonder if it is
better to not do it and instead warn the user they are playing with
fire. But a counter-argument might be, landing within the 3 GP window
is quite low probability...

thanks,

 - Joel



>
> Otherwise I analyzed the patch and it makes sense to me:
>
> Reviewed-by:Joel Fernandes (Google) <joel@...lfernandes.org>
>
> thanks,
>
>  - Joel