linux-kernel - Re: Possible regression due to "tick: broadcast: Prevent livelock from event handler"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1507031041570.3916@nanos>
Date:	Fri, 3 Jul 2015 11:23:12 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Geert Uytterhoeven <geert@...ux-m68k.org>
cc:	Simon Horman <horms@...ge.net.au>,
	Kevin Hilman <khilman@...nel.org>,
	Tyler Baker <tyler.baker@...aro.org>,
	Borislav Petkov <bp@...en8.de>,
	Geert Uytterhoeven <geert+renesas@...der.be>,
	Magnus Damm <magnus.damm@...il.com>,
	Linux-sh list <linux-sh@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Possible regression due to "tick: broadcast: Prevent livelock
 from event handler"

On Fri, 3 Jul 2015, Geert Uytterhoeven wrote:
> Hi Simon,
> 
> On Fri, Jul 3, 2015 at 4:40 AM, Simon Horman <horms@...ge.net.au> wrote:
> > I have observed what appears to be a regression while testing next-20150702
> > which seems to be caused by 2951d5c031a3 ("tick: broadcast: Prevent
> > livelock from event handler").
> >
> > The problem manifests on the emev2/kzm9d board as per the boot log below.
> >
> > The problem manifests when booting using the shmobile_defconfig,
> > which uses multiplatform and enables all devices using DT.
> >
> > The problem does not appear to always manifest but anecdotally it
> > seems to manifest more often of late (yes, I know that is vague).
> 
> > hctosys: unable to open rtc device (rtc0)
> >
> > The boot hangs here.
> > The next line should be:
> >
> > smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 33
> 
> As you can reproduce it, can you please try enabling lockdep debugging?

Just looking at the em_sti driver. It calls clk_prepare/unprepare from
interrupt disabled regions ...

But that's not the problem at hand I think. The above commit is moving
the call to the event handler on the local cpu out of the broadcast
lock region to prevent a live lock. The only real change is the
timing.

Before:

	bc_handler()
	  lock(bc_lock);
	  call_local_handler();
	  send_ipis();
	  reprogramm_bc_device();
	  unlock(bc_lock);

After:

	bc_handler()
	  lock(bc_lock);
	  send_ipis();
	  reprogramm_bc_device();
	  unlock(bc_lock);
	  call_local_handler();

As this runs in hard interrupt context with interrupts disabled, I
really cannot figure out how that makes a difference.

Can you add some debugging to figure out whether the broadcast timer
interrupt still fires?

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/