linux-kernel - Re: too many timer retries happen when do local timer swtich with broadcast timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130222144829.GG12140@e102568-lin.cambridge.arm.com>
Date:	Fri, 22 Feb 2013 14:48:30 +0000
From:	Lorenzo Pieralisi <lorenzo.pieralisi@....com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Santosh Shilimkar <santosh.shilimkar@...com>,
	Jason Liu <liu.h.jason@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: too many timer retries happen when do local timer swtich with
 broadcast timer

On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote:
> On Fri, 22 Feb 2013, Santosh Shilimkar wrote:
> 
> > On Friday 22 February 2013 04:01 PM, Lorenzo Pieralisi wrote:
> > > On Fri, Feb 22, 2013 at 10:24:00AM +0000, Thomas Gleixner wrote:
> > > > On Fri, 22 Feb 2013, Santosh Shilimkar wrote:
> > > > > BTW, Lorenzo off-list mentioned to me about warning in boot-up
> > > > > which I missed while testing your patch. It will take bit more
> > > > > time for me to look into it and hence thought of reporting it.
> > > > > 
> > > > > [    2.186126] ------------[ cut here ]------------
> > > > > [    2.190979] WARNING: at kernel/time/tick-broadcast.c:501
> > > > > tick_broadcast_oneshot_control+0x1c0/0x21c()
> > > > 
> > > > Which one is that? tick_broadcast_pending or tick_force_broadcast_mask ?
> > > 
> > > It is the tick_force_broadcast_mask and I think that's because on all
> > > systems we are testing, the broadcast timer IRQ is a thundering herd,
> > > all CPUs get out of idle at once and try to get out of broadcast mode
> > > at more or less the same time.
> > > 
> > So the issue comes ups only when the idle state used where CPU wakeup
> > more or less at same time as Lorenzo mentioned. I have two platforms
> > where I could test the patch and see the issue only with one platform.
> > 
> > Yesterday I didn't notice the warning because it wasn't seen on that
> > platform :-) OMAP4 idle entry and exit in deep state is staggered
> > between CPUs and hence the warning isn't seen. On OMAP5 though,
> > there is an additional C-state where idle entry/exit for CPU
> > isn't staggered and I see the issue in that case.
> > 
> > Actually the broad-cast code doesn't expect such a behavior
> > from CPUs since only the broad-cast affine CPU should wake
> > up and rest of the CPU should be woken up by the broad-cast
> > IPIs.
> 
> That's what I feared. We might have the same issue on x86, depending
> on the cpu model.
> 
> But thinking more about it. It's actually not a real problem, just
> pointless burning of cpu cycles.
> 
> So on the CPU which gets woken along with the target CPU of the
> broadcast the following happens:
> 
>   deep_idle()
> 			<-- spurious wakeup
>   broadcast_exit()
>     set forced bit
>   
>   enable interrupts
>     
> 			<-- Nothing happens
> 
>   disable interrupts
> 
>   broadcast_enter()
> 			<-- Here we observe the forced bit is set
>   deep_idle()
> 
> Now after that the target CPU of the broadcast runs the broadcast
> handler and finds the other CPU in both the broadcast and the forced
> mask, sends the IPI and stuff gets back to normal.
> 
> So it's not actually harmful, just more evidence for the theory, that
> hardware designers have access to very special drug supplies.
> 
> Now we could make use of that and avoid going deep idle just to come
> back right away via the IPI. Unfortunately the notification thingy has
> no return value, but we can fix that.
> 
> To confirm that theory, could you please try the hack below and add
> some instrumentation (trace_printk)?

Applied, and it looks like that's exactly why the warning triggers, at least
on the platform I am testing on which is a dual-cluster ARM testchip.

There is a still time window though where the CPU (the IPI target) can get
back to idle (tick_broadcast_pending still not set) before the CPU target of
the broadcast has a chance to run tick_handle_oneshot_broadcast (and set
tick_broadcast_pending), or am I missing something ?
It is a corner case, granted. Best thing would be to check pending IRQs in the
idle driver back-end (or have always-on local timers :-)).

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/