[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1085262213.37823.1459174529625.JavaMail.zimbra@efficios.com>
Date:	Mon, 28 Mar 2016 14:15:29 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	"Chatre, Reinette" <reinette.chatre@...el.com>,
	Jacob Pan <jacob.jun.pan@...ux.intel.com>,
	Josh Triplett <josh@...htriplett.org>,
	Ross Green <rgkernel@...il.com>,
	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	lkml <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>, dipankar@...ibm.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	rostedt <rostedt@...dmis.org>,
	David Howells <dhowells@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	pranith kumar <bobby.prani@...il.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
----- On Mar 28, 2016, at 2:13 AM, Peter Zijlstra peterz@...radead.org wrote:
> On Mon, Mar 28, 2016 at 02:23:45AM +0000, Mathieu Desnoyers wrote:
> 
>> >> But, you need hotplug for this to happen, right?
>> > 
>> > My understanding is that this seems to be detection of failures to be
>> > awakened for a long time on idle CPUs. It therefore seems to be more
>> > idle-related than cpu hotplug-related. I'm not saying that there is
>> > no issue with hotplug, just that the investigation so far seems to
>> > target mostly idle systems, AFAIK without stressing hotplug.
> 
> Paul has stated that without hotplug he cannot trigger this.
> 
>> > set_nr_if_polling() returns true if the ti->flags read has the
>> > _TIF_NEED_RESCHED bit set, which will skip the IPI.
> 
> POLLING_NR, as per your later comment
> 
>> > But it seems weird. The side that calls set_nr_if_polling()
>> > does the following:
>> > 1) llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)
>> > 2) set_nr_if_polling(rq->idle)
>> > 3) (don't do smp_send_reschedule(cpu) since set_nr_if_polling() returned
>> >   true)
>> > 
>> > The idle loop does:
>> > 1) __current_set_polling()
>> > 2) __current_clr_polling()
>> > 3) smp_mb__after_atomic()
>> > 4) sched_ttwu_pending()
>> > 5) schedule_preempt_disabled()
>> >   -> This will clear the TIF_NEED_RESCHED flag
>> > 
>> > While the idle loop is in sched_ttwu_pending(), after
>> > it has done the llist_del_all() (thus has grabbed all the
>> > list entries), TIF_NEED_RESCHED is still set.
> 
>> > If both list_all and
> 
> llist_add() ?
Yes, indeed.
> 
>> > set_nr_if_polling() are called right after the llist_del_all(), we
>> > will end up in a situation where we have an entry in the list, but
>> > there won't be any reschedule sent on the idle CPU until something
>> > else awakens it. On a _very_ idle CPU, this could take some time.
> 
> Can't happen, as per clearing of POLLING_NR before doing llist_del_all()
> and the latter being a full memory barrier.
> 
>> > set_nr_and_not_polling() don't seem to have the same issue, because
>> > it does not return true if TIF_NEED_RESCHED is observed as being
>> > already set: it really just depends on the state of the TIF_POLLING_NRFLAG
>> > bit.
>> > 
>> > Am I missing something important ?
>> 
>> Well, it seems that the test for _TIF_POLLING_NRFLAG in set_nr_if_polling()
>> just before the test for _TIF_NEED_RESCHED should take care of it: while in
>> sched_ttwu_pending within the idle loop, the TIF_POLLING_NRFLAG should be
>> cleared, thus causing set_nr_if_polling to return false.
> 
> Right, clue in the name: Set NEED_RESCHED _IF_ POLLING_NR (is set).
> 
>> I'm slightly concerned about the lack of smp_mb__after_atomic()
>> between the TIF_NEED_RESCHED flag being cleared within schedule_preempt_disabled
>> and the TIF_POLLING_NRFLAG being set in the following loop. Indeed, clear_bit()
>> does not have a compiler barrier,
> 
> Urgh, it really should, as all atomic ops. set_bit() very much has a
> memory clobber in, see below.
Yes, I'd be more comfortable with the memory clobber in the clear_bit
too, but theoretically it *should* not matter, because we have a clobber
in set_bit, and clear_bit has a +m memory operand.
> 
>> nor processor-level memory barriers
>> (of course, the processor memory barrier should not really matter on
>> x86-64 due to lock prefix).
> 
> Right.
> 
>> Moreover, TIF_NEED_RESCHED is bit 3 on x86-64,
>> whereas TIF_POLLING_NRFLAG is bit 21. Those are in two different bytes of
>> the thread flags, and thus set/cleared as different addresses by clear_bit()
>> acting on an immediate "nr" argument.
>> 
>> If we have any state where TIF_POLLING_NRFLAG is set before TIF_NEED_RESCHED
>> is cleared within the idle thread, we could end up missing a needed resched IPI.
> 
> Yes, that would be bad. No objection to adding smp_mb__before_atomic()
> before the initial __current_set_polling(). Although that's not going to
> make a difference for x86_64 as you already noted.
Yep.
> 
>> Another question: why are set_nr_if_polling and set_nr_and_not_polling two
>> different implementations ?
> 
> Because they're fundamentally two different things. The one
> conditionally sets NEED_RESCHED, the other unconditionally sets it.
Got it, makes sense.
Thanks!
Mathieu
> 
>> Could they be combined ?
> 
> Can, yes, will not be pretty nor clear code though.
> 
> 
> ---
> arch/x86/include/asm/bitops.h | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 7766d1cf096e..5345784d5e41 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -112,11 +112,13 @@ clear_bit(long nr, volatile unsigned long *addr)
> 	if (IS_IMMEDIATE(nr)) {
> 		asm volatile(LOCK_PREFIX "andb %1,%0"
> 			: CONST_MASK_ADDR(nr, addr)
> -			: "iq" ((u8)~CONST_MASK(nr)));
> +			: "iq" ((u8)~CONST_MASK(nr))
> +			: "memory");
> 	} else {
> 		asm volatile(LOCK_PREFIX "btr %1,%0"
> 			: BITOP_ADDR(addr)
> -			: "Ir" (nr));
> +			: "Ir" (nr)
> +			: "memory");
> 	}
>  }
-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists
 
