lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 18 Jul 2013 17:24:08 -0700 From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> To: Frederic Weisbecker <fweisbec@...il.com> Cc: linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com, dipankar@...ibm.com, akpm@...ux-foundation.org, mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org, rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com, darren@...art.com, sbw@....edu Subject: Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine On Fri, Jul 19, 2013 at 12:46:21AM +0200, Frederic Weisbecker wrote: > On Thu, Jul 18, 2013 at 09:47:49AM -0700, Paul E. McKenney wrote: > > On Thu, Jul 18, 2013 at 04:24:51PM +0200, Frederic Weisbecker wrote: > > > On Wed, Jul 17, 2013 at 08:39:21PM -0700, Paul E. McKenney wrote: > > > > On Thu, Jul 18, 2013 at 03:33:01AM +0200, Frederic Weisbecker wrote: > > > > > So it's like: > > > > > > > > > > CPU 0 CPU 1 > > > > > > > > > > read I write I > > > > > smp_mb() smp_mb() > > > > > cmpxchg S read S > > > > > > > > > > I still can't find what guarantees we don't read a value in CPU 1 that is way below > > > > > what we want. > > > > > > > > One key point is that there is a second cycle from LONG to FULL. > > > > > > > > (Not saying that there is not a bug -- there might well be. In fact, > > > > I am starting to think that I need to do another Promela model... > > > > > > Now I'm very confused :) > > > > To quote a Nobel Laureate who presented at an ISEF here in Portland some > > years back, "Confusion is the most productive state of mind." ;-) > > Then I must be a very productive guy! So that is your secret! ;-) > > > I'm far from being a specialist on these matters but I would really love to > > > understand this patchset. Is there any documentation somewhere I can read > > > that could help, something about cycles of committed memory or something? > > > > Documentation/memory-barriers.txt should suffice for this. If you want > > more rigor, http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf > > > > But memory-barrier pairing suffices here. Here is case 2 from my > > earlier email in more detail. The comments with capital letters > > mark important memory barriers, some of which are buried in atomic > > operations. > > > > 1. Some CPU coming out of idle: > > > > o rcu_sysidle_exit(): > > > > smp_mb__before_atomic_inc(); > > atomic_inc(&rdtp->dynticks_idle); > > smp_mb__after_atomic_inc(); /* A */ > > > > o rcu_sysidle_force_exit(): > > > > oldstate = ACCESS_ONCE(full_sysidle_state); > > > > 2. RCU GP kthread: > > > > o rcu_sysidle(): > > > > cmpxchg(&full_sysidle_state, RCU_SYSIDLE_SHORT, RCU_SYSIDLE_LONG); > > /* B */ > > > > o rcu_sysidle_check_cpu(): > > > > cur = atomic_read(&rdtp->dynticks_idle); > > > > Memory barrier A pairs with memory barrier B, so that if #1's load > > from full_sysidle_state sees RCU_SYSIDLE_SHORT, we know that #1's > > atomic_inc() must be visible to #2's atomic_read(). This will cause #2 > > to recognize that the CPU came out of idle, which will in turn cause it > > to invoke rcu_sysidle_cancel() instead of rcu_sysidle(), resulting in > > full_sysidle_state being set to RCU_SYSIDLE_NOT. > > Ok I get it for that direction. > Now imagine CPU 0 is the RCU GP kthread (#2) and CPU 1 is idle and stays > so. > > CPU 0 then rounds and see that all CPUs are idle, until it finally sets > up RCU_SYSIDLE_SHORT_FULL and finally goes to sleep. > > Then CPU 1 wakes up. It really has to see a value above RCU_SYSIDLE_SHORT > otherwise it won't do the cmpxchg and see the FULL_NOTED that makes it send > the IPI. > > What provides the guarantee that CPU 1 sees a value above RCU_SYSIDLE_SHORT? > Not on the cmpxchg but when it first dereference with ACCESS_ONCE. The trick is that CPU 0 will have scanned, moved to RCU_SYSIDLE_SHORT, scanned, moved to RCU_SYSIDLE_LONG, then scanned again before moving to RCU_SYSIDLE_FULL. Given CPU 1 has been idle all this time, CPU 0 will have read its ->dynticks_idle counter on each scan and seen it having an even value. When CPU 1 comes out of idle, it will atomically increment its ->dyntick_idle(), which will happen after CPU 0's read of ->dyntick_idle() during its last scan. Because of the memory-barrier pairing above, this means that CPU 1's read from full_sysidle_state must follow the cmpxchg() that set full_sysidle_state to RCU_SYSIDLE_LONG (though not necessarily the two later cmpxchg()s that set it to RCU_SYSIDLE_FULL and RCU_SYSIDLE_FULL_NOTED). But because RCU_SYSIDLE_LONG is greater than RCU_SYSIDLE_SHORT, CPU 1 will take action to end the idle period. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists