[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <03ff85d7-ccee-6aa1-8652-1b416571bfbb@linaro.org>
Date: Fri, 11 Aug 2017 11:38:09 +0200
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: paulmck@...ux.vnet.ibm.com
Cc: Pratyush Anand <panand@...hat.com>,
κΉλν <austinkernel.kim@...il.com>,
john.stultz@...aro.org, Steven Rostedt <rostedt@...dmis.org>,
linux-kernel@...r.kernel.org
Subject: Re: RCU stall when using function_graph
On 10/08/2017 23:39, Paul E. McKenney wrote:
> On Thu, Aug 10, 2017 at 11:45:09AM +0200, Daniel Lezcano wrote:
[ ... ]
>> Nothing coming in mind but may be worth to mention the slowness of the
>> CPU is the aggravating factor. In particular I was able to reproduce the
>> issue by setting to the min CPU frequency. With the ondemand governor,
>> we can have the frequency high (hence enough CPU power) at the moment we
>> set the function_graph because another CPU is loaded (and both CPUs are
>> sharing the same clock line). The system became stuck at the moment the
>> other CPU went idle with the lowest frequency. That introduced
>> randomness in the issue and made hard to figure out why the RCU stall
>> was happening.
>
> Adding this, then?
Yes, sure.
Thanks Paul.
-- Daniel
> ------------------------------------------------------------------------
>
> commit f7d9ce95064f76be583c775fac32076fa59f1617
> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Date: Thu Aug 10 14:33:17 2017 -0700
>
> documentation: Slow systems can stall RCU grace periods
>
> If a fast system has a worst-case grace-period duration of (say) ten
> seconds, then running the same workload on a system ten times as slow
> will get you an RCU CPU stall warning given default stall-warning
> timeout settings. This commit therefore adds this possibility to
> stallwarn.txt.
>
> Reported-by: Daniel Lezcano <daniel.lezcano@...aro.org>
> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>
> diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
> index 21b8913acbdf..238acbd94917 100644
> --- a/Documentation/RCU/stallwarn.txt
> +++ b/Documentation/RCU/stallwarn.txt
> @@ -70,6 +70,12 @@ o A periodic interrupt whose handler takes longer than the time
> considerably longer than normal, which can in turn result in
> RCU CPU stall warnings.
>
> +o Testing a workload on a fast system, tuning the stall-warning
> + timeout down to just barely avoid RCU CPU stall warnings, and then
> + running the same workload with the same stall-warning timeout on a
> + slow system. Note that thermal throttling and on-demand governors
> + can cause a single system to be sometimes fast and sometimes slow!
> +
> o A hardware or software issue shuts off the scheduler-clock
> interrupt on a CPU that is not in dyntick-idle mode. This
> problem really has happened, and seems to be most likely to
>
--
<http://www.linaro.org/> Linaro.org β Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists