[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110605110132.GB23463@elte.hu>
Date: Sun, 5 Jun 2011 13:01:32 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Arne Jansen <lists@...-jansens.de>
Cc: Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
mingo@...hat.com, hpa@...or.com, linux-kernel@...r.kernel.org,
efault@....de, npiggin@...nel.dk, akpm@...ux-foundation.org,
frank.rowand@...sony.com, tglx@...utronix.de,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()
* Arne Jansen <lists@...-jansens.de> wrote:
> On 05.06.2011 11:55, Ingo Molnar wrote:
> >
> >* Arne Jansen<lists@...-jansens.de> wrote:
> >
> >>>( Arne, please also double check on a working bootup that the NMI
> >>> watchdog is actually ticking, by checking the NMI counts in
> >>> /proc/interrupts go up slowly but surely on all CPUs. )
> >>
> >>It does, but _very_ slowly. Some CPUs do not count up for tens of
> >>minutes if the machine is idle. If I generate some load like 'make
> >>tags', the counters go up quite quickly.
> >>After 4 minutes and one 'make cscope' it looks like this:
> >>NMI: 8 13 43 5 2
> >>3 22 1 Non-maskable interrupts
> >>
> >>But I never see a single tick on console or in dmesg, even when I
> >>replace the early_printk with a printk.
> >
> >hm, that might be because the NMI watchdog uses halted cycles to
> >tick.
> >
> >That's not a problem (the kernel cannot lock up while there are no
> >cycles ticking) but nevertheless could you work this around please
> >by starting 8 infinite shell loops:
> >
> > for ((i=0; i<8; i++)); do while : ; do : ; done& done
> >
> >?
> >
> >This will saturate all cores and makes sure the NMI watchdog is
> >ticking everywhere.
> >
> >Hopefully this wont make the bug go away :-)
> >
>
> OK, now we get going. I get the ticks, the bug is still there, and
> all CPUs still tick after the lockup. I also added an early_printk
> inside the lockup-if, and it reports hard lockups. At first for only
> one or 2 CPUs, and after some time all CPUs are locked up.
Very good!
If you add a dump_stack() do you get a stacktrace, or do the NMI
watchdog ticks stop?
If the ticks stop this suggests a lockup within the printk code. If
you get a stack dump then we'll have good debug data.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists