linux-kernel - Re: Bug#700333: Stack trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1304221409420.21884@ionos>
Date:	Mon, 22 Apr 2013 14:15:34 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Borislav Petkov <bp@...en8.de>
cc:	vitalif@...rcmc.ru, Ben Hutchings <ben@...adent.org.uk>,
	"Venkatesh Pallipadi (Venki)" <venki@...gle.com>,
	700333@...s.debian.org, LKML <linux-kernel@...r.kernel.org>,
	Clemens Ladisch <clemens@...isch.de>
Subject: Re: Bug#700333: Stack trace

On Sun, 21 Apr 2013, Borislav Petkov wrote:

> + tglx.
> 
> On Sun, Apr 21, 2013 at 01:38:33AM +0400, vitalif@...rcmc.ru wrote:
> > >>Stack trace picture is here:
> > >>http://vmx.yourcmc.ru/var/pics/IMG_20130306_141045.jpg
> > >
> > >Vitaliy reported that his system crashes when suspending to disk.
> > >This
> > >was a regression from 3.2 to 3.7, and remains in 3.8.  Some
> > >details of
> > >this system are in the bug log at <http://bugs.debian.org/700333>.
> > >
> > >The photo shows a BUG in hrtimer_interrupt() after making the
> > >hibernation image and while resuming the non-boot CPUs.  The HPET
> > >interrupt handler was called immediately after it was registered
> > >for CPU
> > >2 (?), before the corresponding clock_event_device was registered.
> > >
> > >Seems like an obvious race condition, but then shouldn't the HPET
> > >have
> > >been stopped while the CPU was previously offlined?  And it's strange
> > >that this system apparently hits the race quite reliably.
> > 
> > Anyone?

So what happens is, that the HPET seems to have an interrupt pending
and this gets immediately fired, when the handler is installed. The
core code does not remove the hpet->event_handler, so it calls into
the hrtimer_interrupt where it hits the BUG and dies.

With the patch below, the box should survive and we should see a 

"Spurious HPET timer interrupt on HPET timer..." entry in dmesg.

That's a first workaround to confirm my theory. I'll look into the
HPET code how we can avoid that at all.

Thanks,

	tglx

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index b1600a6..0f0ce6e 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -323,6 +323,7 @@ static void tick_shutdown(unsigned int *cpup)
 		 */
 		dev->mode = CLOCK_EVT_MODE_UNUSED;
 		clockevents_exchange_device(dev, NULL);
+		dev->event_handler = NULL;
 		td->evtdev = NULL;
 	}
 	raw_spin_unlock_irqrestore(&tick_device_lock, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/