lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130719021207.GA19491@somewhere>
Date:	Fri, 19 Jul 2013 04:12:08 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
	dipankar@...ibm.com, akpm@...ux-foundation.org,
	mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
	niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
	rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com,
	darren@...art.com, sbw@....edu
Subject: Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state
 machine

On Thu, Jul 18, 2013 at 05:24:08PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 19, 2013 at 12:46:21AM +0200, Frederic Weisbecker wrote:
> > On Thu, Jul 18, 2013 at 09:47:49AM -0700, Paul E. McKenney wrote:
> > > 1. Some CPU coming out of idle:
> > > 
> > > o	rcu_sysidle_exit():
> > > 
> > > 	smp_mb__before_atomic_inc();
> > > 	atomic_inc(&rdtp->dynticks_idle);
> > > 	smp_mb__after_atomic_inc(); /* A */
> > > 
> > > o	rcu_sysidle_force_exit():
> > > 
> > > 	oldstate = ACCESS_ONCE(full_sysidle_state);
> > > 
> > > 2. RCU GP kthread:
> > > 
> > > o	rcu_sysidle():
> > > 
> > > 	cmpxchg(&full_sysidle_state, RCU_SYSIDLE_SHORT, RCU_SYSIDLE_LONG);
> > > 		/* B */
> > > 
> > > o	rcu_sysidle_check_cpu():
> > > 
> > > 	cur = atomic_read(&rdtp->dynticks_idle);
> > > 
> > > Memory barrier A pairs with memory barrier B, so that if #1's load
> > > from full_sysidle_state sees RCU_SYSIDLE_SHORT, we know that #1's
> > > atomic_inc() must be visible to #2's atomic_read().  This will cause #2
> > > to recognize that the CPU came out of idle, which will in turn cause it
> > > to invoke rcu_sysidle_cancel() instead of rcu_sysidle(), resulting in
> > > full_sysidle_state being set to RCU_SYSIDLE_NOT.
> > 
> > Ok I get it for that direction.
> > Now imagine CPU 0 is the RCU GP kthread (#2) and CPU 1 is idle and stays
> > so.
> > 
> > CPU 0 then rounds and see that all CPUs are idle, until it finally sets
> > up RCU_SYSIDLE_SHORT_FULL and finally goes to sleep.
> > 
> > Then CPU 1 wakes up. It really has to see a value above RCU_SYSIDLE_SHORT
> > otherwise it won't do the cmpxchg and see the FULL_NOTED that makes it send
> > the IPI.
> > 
> > What provides the guarantee that CPU 1 sees a value above RCU_SYSIDLE_SHORT?
> > Not on the cmpxchg but when it first dereference with ACCESS_ONCE.
> 
> The trick is that CPU 0 will have scanned, moved to RCU_SYSIDLE_SHORT,
> scanned, moved to RCU_SYSIDLE_LONG, then scanned again before moving
> to RCU_SYSIDLE_FULL.  Given CPU 1 has been idle all this time, CPU 0
> will have read its ->dynticks_idle counter on each scan and seen it
> having an even value.  When CPU 1 comes out of idle, it will atomically
> increment its ->dyntick_idle(), which will happen after CPU 0's read of
> ->dyntick_idle() during its last scan.
> 
> Because of the memory-barrier pairing above, this means that CPU
> 1's read from full_sysidle_state must follow the cmpxchg() that
> set full_sysidle_state to RCU_SYSIDLE_LONG (though not necessarily
> the two later cmpxchg()s that set it to RCU_SYSIDLE_FULL and
> RCU_SYSIDLE_FULL_NOTED).  But because RCU_SYSIDLE_LONG is greater than
> RCU_SYSIDLE_SHORT, CPU 1 will take action to end the idle period.

Lets summarize the last sequence, the following happens ordered by time:

        CPU 0                          CPU 1

     cmpxchg(&full_sysidle_state,
             RCU_SYSIDLE_SHORT,
             RCU_SYSIDLE_LONG);

     smp_mb() //cmpxchg

     atomic_read(rdtp(1)->dynticks_idle)

     //CPU 0 goes to sleep
                                       //CPU 1 wakes up
                                       atomic_inc(rdtp(1)->dynticks_idle)

                                       smp_mb()

                                       ACCESS_ONCE(full_sysidle_state)


Are you suggesting that because the CPU 1 executes its atomic_inc() _after_ (in terms
of absolute time) the atomic_read of CPU 0, the ordering settled in both sides guarantees
that the value read from CPU 1 is the one from the cmpxchg that precedes the atomic_read,
or FULL or FULL_NOTED that happen later.

If so that's a big lesson for me.                                     
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ