linux-kernel - Re: [PATCH] rtc: fix deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18608.16826.607428.203994@harpo.it.uu.se>
Date:	Sat, 23 Aug 2008 18:58:34 +0200
From:	Mikael Pettersson <mikpe@...uu.se>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Mikael Pettersson <mikpe@...uu.se>, linux-kernel@...r.kernel.org,
	hpa@...or.com, mingo@...hat.com, tglx@...utronix.de,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] rtc: fix deadlock

Ingo Molnar writes:
 > 
 > * Mikael Pettersson <mikpe@...uu.se> wrote:
 > 
 > > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses
 > > from hpet_rtc_interrupt, usually during shutdown or reboot,
 > > but occasionally also early in init. Today I finally managed
 > > to capture one via a serial cable:
 > > 
 > > INIT: version 2.86 booting
 > > 		Welcome to Fedora Core
 > > 		Press 'I' to enter interactive startup.
 > > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers:
 > > Modules linked in: ehci_hcd uhci_hcd usbcore
 > > 
 > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1)
 > > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0
 > > EIP is at hpet_rtc_interrupt+0x2d2/0x310
 > > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002
 > > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28
 > >  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 > > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000)
 > > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 
 > >        00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 
 > >        00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 
 > > Call Trace:
 > >  [<c0139cc0>] hrtimer_run_pending+0x20/0x90
 > >  [<c01478a8>] handle_IRQ_event+0x28/0x50
 > >  [<c0148f71>] handle_edge_irq+0xa1/0x120
 > >  [<c010615b>] do_IRQ+0x3b/0x70
 > >  [<c0113225>] smp_apic_timer_interrupt+0x55/0x80
 > >  [<c0103c4f>] common_interrupt+0x23/0x28
 > >  [<c02c0000>] unix_release_sock+0xc0/0x220
 > >  =======================
 > > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 
 > > 
 > > This points to the following loop in hpet_rtc_interrupt:
 > > 
 > > 0xc0117090 <hpet_rtc_interrupt+720>:    pause  
 > > 0xc0117092 <hpet_rtc_interrupt+722>:    mov    0xc035ba80,%eax
 > > 0xc0117097 <hpet_rtc_interrupt+727>:    sub    %edi,%eax
 > > 0xc0117099 <hpet_rtc_interrupt+729>:    cmp    $0x1,%eax
 > > 0xc011709c <hpet_rtc_interrupt+732>:    jbe    0xc0117090 <hpet_rtc_interrupt+720>
 > > 
 > > Note: 0xc035ba80 == &jiffies
 > > 
 > > This loop originates from asm-generic/rtc.h:get_rtc_time()
 > > 
 > > 		while (jiffies - uip_watchdog < 2*HZ/100) {
 > > 			barrier();
 > > 			cpu_relax();
 > > 		}
 > > 
 > > Note: HZ == CONFIG_HZ == 100
 > > 
 > > The bug may not originate from the 2.6.27-rc series as I only recently 
 > > enabled HPET in this machine's kernels (not due to HPET problems, it 
 > > inherited its .config way back from an older machine w/o HPET).
 > 
 > argh, that loop in asm-generic/rtc.h:get_rtc_time looks extremely 
 > fragile, we'll lock up if it's ever called with hardirqs off!
 > 
 > Does the patch below do the trick?

Thanks for the patch, I'll give it a try ASAP.

The sporadic nature of the bug means that it will probably take a
couple of days of testing and dozens of reboots w/o problems before
I'm confident to say that the problem's been fixed.

/Mikael

 > 
 > 	Ingo
 > 
 > ----------------->
 > >From 2273cc870b52a7ed09eb225142a6db97299e4f39 Mon Sep 17 00:00:00 2001
 > From: Ingo Molnar <mingo@...e.hu>
 > Date: Sat, 23 Aug 2008 17:59:07 +0200
 > Subject: [PATCH] rtc: fix deadlock
 > 
 > if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly
 > in it, waiting for jiffies to increment.
 > 
 > So make the code more robust by doing an explicit mdelay(20).
 > 
 > This solves a very hard to reproduce/debug hard lockup reported
 > by Mikael Pettersson.
 > 
 > Reported-by: Mikael Pettersson <mikpe@...uu.se>
 > Signed-off-by: Ingo Molnar <mingo@...e.hu>
 > ---
 >  include/asm-generic/rtc.h |   12 ++++--------
 >  1 files changed, 4 insertions(+), 8 deletions(-)
 > 
 > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h
 > index be4af00..71ef3f0 100644
 > --- a/include/asm-generic/rtc.h
 > +++ b/include/asm-generic/rtc.h
 > @@ -15,6 +15,7 @@
 >  #include <linux/mc146818rtc.h>
 >  #include <linux/rtc.h>
 >  #include <linux/bcd.h>
 > +#include <linux/delay.h>
 >  
 >  #define RTC_PIE 0x40		/* periodic interrupt enable */
 >  #define RTC_AIE 0x20		/* alarm interrupt enable */
 > @@ -43,7 +44,6 @@ static inline unsigned char rtc_is_updating(void)
 >  
 >  static inline unsigned int get_rtc_time(struct rtc_time *time)
 >  {
 > -	unsigned long uip_watchdog = jiffies;
 >  	unsigned char ctrl;
 >  	unsigned long flags;
 >  
 > @@ -53,19 +53,15 @@ static inline unsigned int get_rtc_time(struct rtc_time *time)
 >  
 >  	/*
 >  	 * read RTC once any update in progress is done. The update
 > -	 * can take just over 2ms. We wait 10 to 20ms. There is no need to
 > +	 * can take just over 2ms. We wait 20ms. There is no need to
 >  	 * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP.
 >  	 * If you need to know *exactly* when a second has started, enable
 >  	 * periodic update complete interrupts, (via ioctl) and then 
 >  	 * immediately read /dev/rtc which will block until you get the IRQ.
 >  	 * Once the read clears, read the RTC time (again via ioctl). Easy.
 >  	 */
 > -
 > -	if (rtc_is_updating() != 0)
 > -		while (jiffies - uip_watchdog < 2*HZ/100) {
 > -			barrier();
 > -			cpu_relax();
 > -		}
 > +	if (rtc_is_updating())
 > +		mdelay(20);
 >  
 >  	/*
 >  	 * Only the values that we read from the RTC are set. We leave
 > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/