linux-kernel - Re: RCU stall when using function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:   Sun, 6 Aug 2017 10:02:20 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     김동현 <austinkernel.kim@...il.com>
Cc:     Daniel Lezcano <daniel.lezcano@...aro.org>, john.stultz@...aro.org,
        Steven Rostedt <rostedt@...dmis.org>,
        linux-kernel@...r.kernel.org, Pratyush Anand <panand@...hat.com>
Subject: Re: RCU stall when using function_graph

On Sat, Aug 05, 2017 at 02:24:21PM +0900, 김동현 wrote:
> Dear All
> 
> As for me, after configuring function_graph as below, crash disappears.
> "echo 0 > d/tracing/tracing_on"
> "sleep 1"
> 
> "echo function_graph > d/tracing/current_tracer"
> "sleep 1"
> 
> "echo smp_call_function_single > d/tracing/set_ftrace_filter"
> adb shell "sleep 1"
> 
> "echo 1 > d/tracing/tracing_on"
> adb shell "sleep 1"
> 
> Right after function_graph is enabled, too many logs are traced upon IRQ
> transaction which many times eventually causes stall.

That would do it!

Hmmm...

Steven, would it be helpful if RCU were to inform tracing (say) halfway
through the RCU CPU stall interval, allowing the tracer to do something
like cond_resched_rcu_qs()?  I can imagine all sorts of reasons why this
wouldn't work, for example, if all the tracing was with irqs disabled
or some such, but figured I should ask.

Does Guillermo's approach work for others?

							Thanx, Paul

> BR,
> Guillermo Austin Kim
> 
> 2017. 8. 3. 오후 11:38에 "Daniel Lezcano" <daniel.lezcano@...aro.org>님이 작성:
> 
> On Thu, Aug 03, 2017 at 05:44:21AM -0700, Paul E. McKenney wrote:
> 
> [ ... ]
> 
> > > > BTW, function_graph tracer is the most invasive of the tracers. It's
> 4x
> > > > slower than function tracer. I'm wondering if the tracer isn't the
> > > > cause, but just slows things down enough to cause a some other race
> > > > condition that triggers the bug.
> > >
> > > Yes, that could be true.
> > >
> > > I tried the following scenario:
> > >
> > >  - cpufreq governor => userspace + max_freq (1.2GHz)
> > >    - function_graph set ==> OK
> > >
> > >  - cpufreq governor => userspace + min_freq (200MHz)
> > >    - function_graph set ==> RCU stall
> > >
> > > Beside that, I realize the board is constantly processing SOF interrupts
> > > every 124us, so that adds more overhead.
> > >
> > > Removing the USB support, thus the associated processing for the SOF
> > > interrupts, I don't see anymore the RCU stall.
> >
> > Looks like Steve called this one!  ;-)
> 
> Yep :)
> 
> > > Is it the expected behavior to have the system hang after a RCU stall
> > > raises ?
> >
> > No, but if NMI stack traces are enabled and there are any NMI problems,
> > bad things can happen.  In addition, the bulk of output can cause problems
> > if you have a slow console connection.
> 
> Ok, thanks.
> 
>   -- Daniel
> 
> --
> 
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog