linux-kernel - Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YTkeGVJZQCJ-_3ez_BpeMJ0TOYONCv_g76MDh5=M+9x_Q@mail.gmail.com>
Date:   Fri, 16 Dec 2022 22:19:58 -0500
From:   Joel Fernandes <joel@...lfernandes.org>
To:     paulmck@...nel.org
Cc:     Frederic Weisbecker <frederic@...nel.org>, boqun.feng@...il.com,
        neeraj.iitr10@...il.com, urezki@...il.com, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check()
 comments

Hi,
On the related subject of this function, I drew a diagram for one of
the reasons why per-CPU unlock counts have to be scanned first, for a
particular index, before the per-CPU lock counts, and not the other
way. Otherwise, a reader that got preempted after reading the index,
can suddenly get scheduled during the inactive index's scan, and cause
the total lock and unlock counts to falsely match:
https://i.imgur.com/79fDWdQ.png

Cheers,

 - Joel



On Fri, Dec 16, 2022 at 11:54 AM Joel Fernandes <joel@...lfernandes.org> wrote:
>
>
>
> > On Dec 16, 2022, at 11:51 AM, Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Fri, Dec 16, 2022 at 04:32:39PM +0000, Joel Fernandes wrote:
> >> On Thu, Dec 15, 2022 at 05:09:14PM -0800, Paul E. McKenney wrote:
> >> [...]
> >>>>>> 2. unlock()'s smp_mb() happened before Flip+smp_mb() , now the reader
> >>>>>> has no new smp_mb() that happens AFTER the flip happened. So it can
> >>>>>> totally sample the old idx again -- that particular reader will
> >>>>>> increment twice, but the next time, it will see the flipped one.
> >>>>>
> >>>>> I will let you transliterate both.  ;-)
> >>>>
> >>>> I think I see what you mean now :)
> >>>>
> >>>> I believe the access I am referring to is the read of idx on one side and
> >>>> the write to idx on the other. However that is incomplete and I need to
> >>>> pair that with some of other access on both sides.
> >>>>
> >>>> So perhaps this:
> >>>>
> >>>> Writer does flip + smp_mb + read unlock counts [1]
> >>>>
> >>>> Reader does:
> >>>> read idx + smp_mb() + increment lock counts [2]
> >>>>
> >>>> And subsequently reader does
> >>>> Smp_mb() + increment unlock count. [3]
> >>>>
> >>>> So [1] races with either [2] or [2]+[3].
> >>>>
> >>>> Is that fair?
> >>>
> >>> That does look much better, thank you!
> >>
> >> Perhaps a comment with an ASCII diagram will help?
> >>
> >>
> >> Case 2:
> >> Both the reader and the updater see each other's writes too late, but because
> >> of memory barriers on both sides, they will eventually see each other's write
> >> with respect to their own. This is similar to the store-buffer problem. This
> >> let's a single reader contribute a maximum (unlock minus lock) imbalance of 2.
> >>
> >> The following diagram shows the subtle worst case followed by a simplified
> >> store-buffer explanation.
> >>
> >> READER                  UPDATER
> >> -------------           ----------
> >>                           // idx is initially 0.
> >> read_lock() {
> >>  READ(idx) = 0;
> >>  lock[0]++; --------------------------------------------,
> >>                           flip() {                      |
> >>                              smp_mb();                  |
> >>  smp_mb();                                              |
> >> }                                                        |
> >>                                                         |
> >> // RSCS                                                  |
> >>                                                         |
> >> read_unlock() {                                          |
> >>  smp_mb();                                              |
> >>                              idx++;  // P               |
> >>                              smp_mb();                  |
> >>                           }                             |
> >>                                                         |
> >>                           scan_readers_idx(0) {         |
> >>                               count all unlock[0];      |
> >>                                   |                     |
> >>                                   |                     |
> >>  unlock[0]++; //X <--not-counted--`-----,               |
> >>                                         |               |
> >> }                                        V               `------,
> >>                               // Will make sure next scan      |
> >>                               // will not miss this unlock (X) |
> >>                               // if other side saw flip (P) ,--`
> >>                               // Call this MB [1]           |
> >>                               // Order write(idx) with      |
> >>                               // next scan's unlock.        |
> >>                               smp_mb();                 ,---`
> >> read_lock() {                                            |
> >>  READ(idx)=0;                                           |
> >>  lock[0]++; ----------------> count all lock[0];        |
> >>  smp_mb();         |     }                              |
> >> }     |             |                                    V
> >>      |             `---> // Incorrect contribution to lock counting
> >>      |                   // upto a maximum of 2 times.
> >>      |
> >>       `---> // Pairs with MB [1]. Makes sure that
> >>             // the next read_lock()'s' idx read (Y) is ordered
> >>             // with above write to unlock[0] (X).
> >>                            |
> >> rcu_read_unlock() {         |
> >>  smp_mb(); <---------------`
> >>  unlock[0]++;
> >> }
> >>
> >> read_lock() {
> >>  READ(idx) = 1; //Y
> >>  lock[1]++;
> >>  ...
> >> }
> >>                           scan_readers_idx(0) {
> >>                               count all unlock[0]; //Q
> >>                               ...
> >>
> >>
> >> thanks,
> >>
> >> - Joel
> >>
> >>                          }
> >>
> >> This makes it similar to the store buffer pattern. Using X, Y, P and Q
> >> annotated above, we get:
> >>
> >> READER                    UPDATER
> >> X (write)                 P (write)
> >>
> >> smp_mb();                 smp_mb();
> >>
> >> Y (read)                  Q (read)
> >
> > Given that this diagram is more than 50 lines long, it might go better in
> > a design document describing this part of RCU.  Perhaps less detail or
> > segmented, but the same general idea as this guy:
> >
> > Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
>
> Yes, this sounds like a good place to add it and perhaps we refer to it from the C source file? I can take this up to do over the holidays, if you prefer.
>
> Thanks,
>
>   - Joel
>
>
> >
> > Thoughts?
> >
> >                        Thanx, Paul