[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170803024009.GM3730@linux.vnet.ibm.com>
Date: Wed, 2 Aug 2017 19:40:09 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, john.stultz@...aro.org,
linux-kernel@...r.kernel.org, Pratyush Anand <panand@...hat.com>
Subject: Re: RCU stall when using function_graph
On Wed, Aug 02, 2017 at 09:07:44AM -0400, Steven Rostedt wrote:
> On Wed, 2 Aug 2017 14:42:39 +0200
> Daniel Lezcano <daniel.lezcano@...aro.org> wrote:
>
> > On Tue, Aug 01, 2017 at 08:12:14PM -0400, Steven Rostedt wrote:
> > > On Wed, 2 Aug 2017 00:15:44 +0200
> > > Daniel Lezcano <daniel.lezcano@...aro.org> wrote:
> > >
> > > > On 02/08/2017 00:04, Paul E. McKenney wrote:
> > > > >> Hi Paul,
> > > > >>
> > > > >> I have been trying to set the function_graph tracer for ftrace and each time I
> > > > >> get a CPU stall.
> > > > >>
> > > > >> How to reproduce:
> > > > >> -----------------
> > > > >>
> > > > >> echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > > >>
> > > > >> This error appears with v4.13-rc3 and v4.12-rc6.
> > >
> > > Can you bisect this? It may be due to this commit:
> > >
> > > 0598e4f08 ("ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines")
> >
> > Hi Steve,
> >
> > I git bisected but each time the issue occured. I went through the different
> > version down to v4.4 where the board was not fully supported and it ended up to
> > have the same issue.
> >
> > Finally, I had the intuition it could be related to the wall time (there is no
> > RTC clock with battery on the board and the wall time is Jan 1st, 1970).
> >
> > Setting up the with ntpdate solved the problem.
> >
> > Even if it is rarely the case to have the time not set, is it normal to have a
> > RCU cpu stall ?
> >
> >
>
> BTW, function_graph tracer is the most invasive of the tracers. It's 4x
> slower than function tracer. I'm wondering if the tracer isn't the
> cause, but just slows things down enough to cause a some other race
> condition that triggers the bug.
Easy to check! Use the rcupdate.rcu_cpu_stall_timeout kernel boot
parameter to increase this timeout by a factor of four. Mainline
default is 21 seconds, but many distros set it to 60 seconds.
You can always check sysfs to find the value for your system, or
CONFIG_RCU_CPU_STALL_TIMEOUT in your .config file.
Thanx, Paul
Powered by blists - more mailing lists