lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170806170220.GQ3730@linux.vnet.ibm.com>
Date:   Sun, 6 Aug 2017 10:02:20 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     김동현 <austinkernel.kim@...il.com>
Cc:     Daniel Lezcano <daniel.lezcano@...aro.org>, john.stultz@...aro.org,
        Steven Rostedt <rostedt@...dmis.org>,
        linux-kernel@...r.kernel.org, Pratyush Anand <panand@...hat.com>
Subject: Re: RCU stall when using function_graph

On Sat, Aug 05, 2017 at 02:24:21PM +0900, 김동현 wrote:
> Dear All
> 
> As for me, after configuring function_graph as below, crash disappears.
> "echo 0 > d/tracing/tracing_on"
> "sleep 1"
> 
> "echo function_graph > d/tracing/current_tracer"
> "sleep 1"
> 
> "echo smp_call_function_single > d/tracing/set_ftrace_filter"
> adb shell "sleep 1"
> 
> "echo 1 > d/tracing/tracing_on"
> adb shell "sleep 1"
> 
> Right after function_graph is enabled, too many logs are traced upon IRQ
> transaction which many times eventually causes stall.

That would do it!

Hmmm...

Steven, would it be helpful if RCU were to inform tracing (say) halfway
through the RCU CPU stall interval, allowing the tracer to do something
like cond_resched_rcu_qs()?  I can imagine all sorts of reasons why this
wouldn't work, for example, if all the tracing was with irqs disabled
or some such, but figured I should ask.

Does Guillermo's approach work for others?

							Thanx, Paul

> BR,
> Guillermo Austin Kim
> 
> 2017. 8. 3. 오후 11:38에 "Daniel Lezcano" <daniel.lezcano@...aro.org>님이 작성:
> 
> On Thu, Aug 03, 2017 at 05:44:21AM -0700, Paul E. McKenney wrote:
> 
> [ ... ]
> 
> > > > BTW, function_graph tracer is the most invasive of the tracers. It's
> 4x
> > > > slower than function tracer. I'm wondering if the tracer isn't the
> > > > cause, but just slows things down enough to cause a some other race
> > > > condition that triggers the bug.
> > >
> > > Yes, that could be true.
> > >
> > > I tried the following scenario:
> > >
> > >  - cpufreq governor => userspace + max_freq (1.2GHz)
> > >    - function_graph set ==> OK
> > >
> > >  - cpufreq governor => userspace + min_freq (200MHz)
> > >    - function_graph set ==> RCU stall
> > >
> > > Beside that, I realize the board is constantly processing SOF interrupts
> > > every 124us, so that adds more overhead.
> > >
> > > Removing the USB support, thus the associated processing for the SOF
> > > interrupts, I don't see anymore the RCU stall.
> >
> > Looks like Steve called this one!  ;-)
> 
> Yep :)
> 
> > > Is it the expected behavior to have the system hang after a RCU stall
> > > raises ?
> >
> > No, but if NMI stack traces are enabled and there are any NMI problems,
> > bad things can happen.  In addition, the bulk of output can cause problems
> > if you have a slow console connection.
> 
> Ok, thanks.
> 
>   -- Daniel
> 
> --
> 
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ