[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080207005110.GA1457@elte.hu>
Date: Thu, 7 Feb 2008 01:51:10 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: a.p.zijlstra@...llo.nl, linux-kernel@...r.kernel.org,
Gautham R Shenoy <ego@...ibm.com>
Subject: Re: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE
tasks
* Andrew Morton <akpm@...ux-foundation.org> wrote:
> Nope.
>
> But I tested it on mainline, and mainline exhibits the
> never-powers-off symptom, whereas
> ed50d6cbc394cd0966469d3e249353c9dd1d38b9 demonstrates the
> powers-off-after-20-seconds symptom.
>
> So we _may_ be dealing with two bugs here, and your patch might have
> fixed the first, but that success is obscured by the second. I guess
> I need to prepare a tree which has
> ed50d6cbc394cd0966469d3e249353c9dd1d38b9 at its tip. (Wonders how to
> do that).
the way i do it in bisection is to do:
mkdir patches
git-log -1 -p ed50d6cbc394cd0966469d3 > patches/fix.patch
echo fix.patch > patches/series
and then before testing a bisection point, i do a 'quilt push'. Before
telling git-bisect about the quality of that bisection point (good/bad)
i pop it off via 'quilt pop'.
this way the 'required fix' can be kept during the bisection, to find
the secondary bug.
> btw, mainline (plus this patch, not that it changed anything) prints
>
> <stopping disk stuff>
> Disabling non-boot CPUs
> CPU 1 is now offline
>
> and that's it. This machine has eight cpus. Might be a hint?
what should be the proper message?
my suspects, besides there being something wrong in the hung-tasks code
of the softlockup watchdog, would be the cpu-hotplug commits, or some
arch/x86 commit. (although we didnt really have anything specifically
touching the the reboot path)
does a stupid patch like the one below tell you more about what the
other CPUs are doing during this hang? [32-bit only patch]
Ingo
---
arch/i386/kernel/nmi.c | 8 ++++++++
1 file changed, 8 insertions(+)
Index: linux/arch/i386/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86/kernel/nmi_64.c
+++ linux/arch/x86/kernel/nmi_64.c
@@ -331,6 +331,14 @@ __kprobes int nmi_watchdog_tick(struct p
int touched = 0;
int cpu = smp_processor_id();
int rc=0;
+ static int count[NR_CPUS];
+
+ if (!count[cpu]) {
+ count[cpu] = nmi_hz;
+ printk("CPU#%d, tick\n", cpu);
+ show_regs(regs);
+ }
+ count[cpu]--;
/* check for other users first */
if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists