lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 7 Feb 2008 01:51:10 +0100 From: Ingo Molnar <mingo@...e.hu> To: Andrew Morton <akpm@...ux-foundation.org> Cc: a.p.zijlstra@...llo.nl, linux-kernel@...r.kernel.org, Gautham R Shenoy <ego@...ibm.com> Subject: Re: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks * Andrew Morton <akpm@...ux-foundation.org> wrote: > Nope. > > But I tested it on mainline, and mainline exhibits the > never-powers-off symptom, whereas > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 demonstrates the > powers-off-after-20-seconds symptom. > > So we _may_ be dealing with two bugs here, and your patch might have > fixed the first, but that success is obscured by the second. I guess > I need to prepare a tree which has > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 at its tip. (Wonders how to > do that). the way i do it in bisection is to do: mkdir patches git-log -1 -p ed50d6cbc394cd0966469d3 > patches/fix.patch echo fix.patch > patches/series and then before testing a bisection point, i do a 'quilt push'. Before telling git-bisect about the quality of that bisection point (good/bad) i pop it off via 'quilt pop'. this way the 'required fix' can be kept during the bisection, to find the secondary bug. > btw, mainline (plus this patch, not that it changed anything) prints > > <stopping disk stuff> > Disabling non-boot CPUs > CPU 1 is now offline > > and that's it. This machine has eight cpus. Might be a hint? what should be the proper message? my suspects, besides there being something wrong in the hung-tasks code of the softlockup watchdog, would be the cpu-hotplug commits, or some arch/x86 commit. (although we didnt really have anything specifically touching the the reboot path) does a stupid patch like the one below tell you more about what the other CPUs are doing during this hang? [32-bit only patch] Ingo --- arch/i386/kernel/nmi.c | 8 ++++++++ 1 file changed, 8 insertions(+) Index: linux/arch/i386/kernel/nmi.c =================================================================== --- linux.orig/arch/x86/kernel/nmi_64.c +++ linux/arch/x86/kernel/nmi_64.c @@ -331,6 +331,14 @@ __kprobes int nmi_watchdog_tick(struct p int touched = 0; int cpu = smp_processor_id(); int rc=0; + static int count[NR_CPUS]; + + if (!count[cpu]) { + count[cpu] = nmi_hz; + printk("CPU#%d, tick\n", cpu); + show_regs(regs); + } + count[cpu]--; /* check for other users first */ if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists