[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081121155333.GB6775@linux.vnet.ibm.com>
Date: Fri, 21 Nov 2008 07:53:33 -0800
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Folkert van Heusden <folkert@...heusden.com>
Cc: Lai Jiangshan <laijs@...fujitsu.com>, linux-kernel@...r.kernel.org
Subject: Re: [2.6.28-rc5] RCU detected CPU 0 stall (t=4294893165/750
jiffies)
On Fri, Nov 21, 2008 at 04:34:26PM +0100, Folkert van Heusden wrote:
> > > I'm afraid there's no script for that: it happens during boot.
> >
> > This is a HZ=250 machine, correct? If so, please try the following
> > patch (already in -tip), which helps suppress boot-time false positives.
>
> That's correct, 250Hz.
>
> > diff --git a/include/linux/rcuclassic.h b/include/linux/rcuclassic.h
> > index 5f89b62..301dda8 100644
> > --- a/include/linux/rcuclassic.h
> > +++ b/include/linux/rcuclassic.h
> > @@ -41,7 +41,7 @@
> > #include <linux/seqlock.h>
> >
> > #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
> > -#define RCU_SECONDS_TILL_STALL_CHECK ( 3 * HZ) /* for rcp->jiffies_stall */
> > +#define RCU_SECONDS_TILL_STALL_CHECK (10 * HZ) /* for rcp->jiffies_stall */
>
> Isn't it better to let the define depend on the value of CONFIG_HZ?
> E.g.
>
> Signed-off-by: Folkert van Heusden <folkert@...heusden.com>
>
> diff --git a/include/linux/rcuclassic.h b/include/linux/rcuclassic.h
> index 5f89b62..301dda8 100644
> --- a/include/linux/rcuclassic.h
> +++ b/include/linux/rcuclassic.h
> @@ -41,7 +41,7 @@
> #include <linux/seqlock.h>
>
> #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
> -#define RCU_SECONDS_TILL_STALL_CHECK ( 3 * HZ) /* for rcp->jiffies_stall */
> +#define RCU_SECONDS_TILL_STALL_CHECK ( (CONFIG_HZ / 100) * 3 * HZ) /* for rcp->jiffies_stall */
> #define RCU_SECONDS_TILL_STALL_RECHECK (30 * HZ) /* for rcp->jiffies_stall */
> #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
The stalls occur when CPUs spin in the kernel with preemption (or irqs
or whatever) disabled. So while I suppose that there is some
possibility that such a spin might be a function of HZ, I have never
seen this happen.
The reason I asked for your HZ value was to make sure that the stall
detection was 3 seconds (750 jiffies). If you had been running a
75HZ system (admittedly unlikely) you would have seen a 10-second stall,
and the patch would not help. In that case, the right thing to do would
have been to work out why the system was spinning for 10 seconds during
boot -- tough to get a 5-second boot when the system spins for 10
seconds coming up, right? ;-)
Thanx, Paul
> Folkert van Heusden
>
> --
> MultiTail na wan makriki wrokosani fu tan luku den logfile nanga san
> den commando spiti puru. Piki puru spesrutu sani, wroko nanga difrenti
> kroru, tya kon makandra, nanga wan lo moro.
> http://www.vanheusden.com/multitail/
> ----------------------------------------------------------------------
> Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists