linux-kernel - Re: [RFC 1/2] srcu: Remove comment about prior read lock counts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YRSKRkVc=+HiknQYkLrBGwWfUn3cCp3Btj5AAeqY7KKZw@mail.gmail.com>
Date:   Sun, 18 Dec 2022 16:19:07 -0500
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 1/2] srcu: Remove comment about prior read lock counts

Hi Mathieu,
Thanks a lot for sharing details of side-rcu, I will review it over
the holidays! I agree we can/should borrow from that implementation
where possible. I replied to the comments below:

On Sun, Dec 18, 2022 at 4:07 PM Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> On 2022-12-18 14:13, Joel Fernandes (Google) wrote:
> > The comment says that if an updater saw lock count updates, then
> > ensure the reader does not see the new srcu_idx.
> >
> > However, there is no memory barrier between a READER reading srcu_idx
> > with respect to incrementing the lock count for that srcu_idx.
> >
> > So what is really happening is, both "B" and "C" will order the current
> > reader's unlock count update, and the _next_ readers lock count update, with
> > respect to the write to the currently active index.
> >
> > Consider first the case of the unlock count update being seen by the UPDATER:
> >
> > (for brevity, the pseudocode shortens "srcu_idx" to "idx")
> >
> > READER                            UPDATER
> >
> > rcu_read_lock() {
> >      idx = READ(idx);
> >      lock_count[idx]++;
> >
> >      smp_mb();    // B
> > }
> >                                  srcu_flip() {
> >                                      smp_mb(); //E
> >                                      idx++;
> >                                      smp_mb();
> >                                  }
> > rcu_read_unlock() {
> >      smp_mb();    // C
> >      unlock_count[idx]++;
> > }
> >
> > Consider that the updater saw the unlock count update, and due to this, we
> > expect "E" to make sure that the reader only used the old srcu_idx.
> >
> > However, say the reader used the new srcu_idx because we dropped "E".  That is
> > totally OK because both unlock and lock counts of this reader will negate each
> > other during the next scan of the srcu_idx. So we don't have to guarantee at
> > all that the reader used the old srcu_idx, that does not buy us anything
> > because if it used the new one, we would just ignore it during the next scan
> > anyway (the reader is "done").
> >
> > Now lets look at the following case:
> >
> > READER                            UPDATER
> >
> > rcu_read_lock() {
> >      idx = READ(idx);
> >      lock_count[idx]++;
> >
> >      smp_mb();    // B
> > }
> >
> > rcu_read_unlock() {
> >      smp_mb();    // C
> >      unlock_count[idx]++;
> > }
> >                                  srcu_flip() {
> >                                      smp_mb(); //E
> >                                      idx++;
> > rcu_read_lock() {
> >      idx = READ(idx);
> >      lock_count[idx]++;
> >
> >      smp_mb();    // B
> >                                      smp_mb();
> >                                  }
> > }
> >
> > Consider that the updater saw the lock count update of the second
> > rcu_read_lock(). It does not matter that we guarantee that the reader sees only
> > the old srcu_idx. This is because, a reader could totally just sample
> > srcu_idx, and stay preempted for long periods of time. So, during any scan, we
> > already have the issue of a preempted-reader randomly springing up with a copy
> > of the index which we consider the "new index". So guaranteeing that the reader
> > saw the old srcu_idx instead of the new one if we saw its lock count updates,
> > also does not buy us anything.
> >
> > Due to these reasons, drop the argument that the reader has to see a certain
> > srcu_idx since we have no control over that anyway, and guaranteeing that does not
> > buy us anything.
>
> I don't understand why this first patch only removes a comment about the
> need to order things, when in fact it's the entire memory barrier /* E
> */ that is useless.
>
> I suspect we should just remove the comment along with the barrier
> without this added step.

Yes, this is what I do in patch 2/2. I just wanted to break it down
because the change log would be too big otherwise. There were 2
reasons given for the "E" memory barrier, so I split it into 2 patches
for that reason so both patches can be discussed possibly separately.
But no issues with squashing as well and merging change logs.

> What SRCU fundamentally does is detect quiescence of all SRCU readers
> between the beginning of the grace period and its completion. In order
> to ensure forward progress, it does so in a two-phase algorithm. What
> the grace period does to detect quiescence is to observe that each of
> the periods (0/1) have no active reader at a given point in the grace
> period.

Yes agreed! By deactivating an index, it is possible to ensure new
readers don't show up on the deactivated one, and so we are bound in
scan time / forward progress. (Except for the special case where a
reader was preempted after picking up the index, and then suddenly
shows up in the scan).

> Then the fact that the period is flipped to send new-coming
> readers into a different period is just to ensure forward progress, and
> is basically irrelevant (order-wise) with the fact that the grace period
> scans both periods and validates that both periods are observed to have
> no active readers.

Yes.

> I'd favor just clarifying the SRCU model in this way, and then remove
> the useless barrier, and also implement improvements to the SRCU
> algorithm and skip the "flip" entirely when we notice (early) that no
> active readers are present in both periods. This is actually similar to
> what I implemented in side-rcu.
> Thoughts ?

Agreed! Thanks so much for sharing this, and I will take a look on my
side and study it. Interesting on that optimization to skip the flip!

Thanks,

 - Joel


>
> Thanks,
>
> Mathieu
>
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> > ---
> >   kernel/rcu/srcutree.c | 10 ++++------
> >   1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 1c304fec89c0..d6a4c2439ca6 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -983,12 +983,10 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount)
> >   static void srcu_flip(struct srcu_struct *ssp)
> >   {
> >       /*
> > -      * Ensure that if this updater saw a given reader's increment
> > -      * from __srcu_read_lock(), that reader was using an old value
> > -      * of ->srcu_idx.  Also ensure that if a given reader sees the
> > -      * new value of ->srcu_idx, this updater's earlier scans cannot
> > -      * have seen that reader's increments (which is OK, because this
> > -      * grace period need not wait on that reader).
> > +      * Ensure that if a given reader sees the new value of ->srcu_idx, this
> > +      * updater's earlier scans cannot have seen that reader's increments
> > +      * (which is OK, because this grace period need not wait on that
> > +      * reader).
> >        */
> >       smp_mb(); /* E */  /* Pairs with B and C. */
> >
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>