lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 3 Jan 2009 02:23:58 +0000
From:	"Duane Griffin" <duaneg@...da.com>
To:	Chris Adams <cmadams@...aay.net>
Cc:	Linas Vepstas <linasvepstas@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: Bug: Status/Summary of slashdot leap-second crash on new years
	2008-2009

On Fri, Jan 02, 2009 at 06:21:14PM -0600, Chris Adams wrote:
> Once upon a time, Linas Vepstas <linasvepstas@...il.com> said:
> > Below follows a summary of the reported crashes. I'm ignoring the
> > zillions of "mine didn't crash" reports, or the "you're a paranoid
> > conspiracy theorist, its random chance" reports.
> 
> I have reproduced this and got a stack trace (this is with Fedora 8 and
> kernel kernel-2.6.26.6-49.fc8.x86_64):
> 
> #0  ktime_get_ts (ts=0xffffffff8158bb30) at include/asm/processor.h:691
> #1  0xffffffff8104c09a in ktime_get () at kernel/hrtimer.c:59
> #2  0xffffffff8102a39a in hrtick_start_fair (rq=0xffff810009013880, 
>     p=<value optimized out>) at kernel/sched.c:1064
> #3  0xffffffff8102decc in enqueue_task_fair (rq=0xffff810009013880, 
>     p=0xffff81003fb02d40, wakeup=1) at kernel/sched_fair.c:863
> #4  0xffffffff81029a08 in enqueue_task (rq=0xffffffff8158bb30, 
>     p=0xffff81003b8ac418, wakeup=-994836480) at kernel/sched.c:1550
> #5  0xffffffff81029a39 in activate_task (rq=0xffff810009013880, 
>     p=0xffff81003b8ac418, wakeup=20045) at kernel/sched.c:1614
> #6  0xffffffff8102be38 in try_to_wake_up (p=0xffff81003fb02d40, 
>     state=<value optimized out>, sync=0) at kernel/sched.c:2173
> #7  0xffffffff8102be9c in default_wake_function (curr=<value optimized out>, 
>     mode=998949912, sync=20045, key=0x4c4b40000) at kernel/sched.c:4366
> #8  0xffffffff810492ed in autoremove_wake_function (wait=0xffffffff8158bb30, 
>     mode=998949912, sync=20045, key=0x4c4b40000) at kernel/wait.c:132
> #9  0xffffffff810296a2 in __wake_up_common (q=0xffffffff813d3180, mode=1, 
>     nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4387
> #10 0xffffffff8102b97b in __wake_up (q=0xffffffff813d3180, mode=1, 
>     nr_exclusive=1, key=0x0) at kernel/sched.c:4406
> #11 0xffffffff8103692f in wake_up_klogd () at kernel/printk.c:1005
> #12 0xffffffff81036abb in release_console_sem () at kernel/printk.c:1051
> #13 0xffffffff81036fd1 in vprintk (fmt=<value optimized out>, 
>     args=<value optimized out>) at kernel/printk.c:789
> #14 0xffffffff81037081 in printk (
>     fmt=0xffffffff8158bb30 "yj$\201????\2008\001\t") at kernel/printk.c:613
> #15 0xffffffff8104ec16 in ntp_leap_second (timer=<value optimized out>)
>     at kernel/time/ntp.c:143
> #16 0xffffffff8104b7a6 in run_hrtimer_pending (cpu_base=0xffff81000900f740)
>     at kernel/hrtimer.c:1204
> #17 0xffffffff8104b86a in run_hrtimer_softirq (h=<value optimized out>)
>     at kernel/hrtimer.c:1355
> #18 0xffffffff8103b31f in __do_softirq () at kernel/softirq.c:234
> #19 0xffffffff8100d52c in call_softirq () at include/asm/current_64.h:10
> #20 0xffffffff8100ed5e in do_softirq () at arch/x86/kernel/irq_64.c:262
> #21 0xffffffff8103b280 in irq_exit () at kernel/softirq.c:310
> #22 0xffffffff8101b0fe in smp_apic_timer_interrupt (regs=<value optimized out>)
>     at arch/x86/kernel/apic_64.c:514
> #23 0xffffffff8100cf52 in apic_timer_interrupt ()
>     at include/asm/current_64.h:10
> #24 0xffff81003b9d5a90 in ?? ()
> #25 0x0000000000000000 in ?? ()
> 
> 
> Basically (to my untrained eye), the leap second code is called from the
> timer interrupt handler, which holds xtime_lock.  The leap second code
> does a printk to notify about the leap second.  The printk code tries to
> wake up klogd (I assume to prioritize kernel messages), and (under some
> conditions), the scheduler attempts to get the current time, which tries
> to get xtime_lock => deadlock.
> 
> I can only reproduce this if the system is busy.  If the system is
> otherwise idle at the timer interrupt, I guess the scheduler doesn't try
> to get the time.  I can run a "find / | xargs cat > /dev/nul" in one
> window and then trigger the leap second in another, and the system dies
> most of the time.
> 
> I'm looking at the source for the RHEL 4 kernel 2.6.9-67.0.7.EL (which I
> had crash on a system), and the scheduler is enough different that I am
> not finding the path to the deadlock right off.
> 
> In any case, the quick-n-dirty fix would be to not try to printk while
> holding xtime_lock (I think the NTP code is the only thing that does).
> However, it would be nice to still get the leap second notification, so
> some other fix would be better I guess.

How about just moving the printk out of the lock? I.e. something like
this:

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index f5f793d..ad3e2b7 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -140,8 +140,6 @@ static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
 		xtime.tv_sec--;
 		wall_to_monotonic.tv_sec++;
 		time_state = TIME_OOP;
-		printk(KERN_NOTICE "Clock: "
-		       "inserting leap second 23:59:60 UTC\n");
 		hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC);
 		res = HRTIMER_RESTART;
 		break;
@@ -166,6 +164,10 @@ static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
 
 	write_sequnlock(&xtime_lock);
 
+	if (res == HRTIMER_RESTART)
+		printk(KERN_NOTICE "Clock: "
+		       "inserting leap second 23:59:60 UTC\n");
+
 	return res;
 }
 

> -- 
> Chris Adams <cmadams@...aay.net>
> Systems and Network Administrator - HiWAAY Internet Services
> I don't speak for anybody but myself - that's enough trouble.

Cheers,
Duane.

-- 
"I never could learn to drink that blood and call it wine" - Bob Dylan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ