linux-kernel - Re: RCU stall when using function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <03ff85d7-ccee-6aa1-8652-1b416571bfbb@linaro.org>
Date:   Fri, 11 Aug 2017 11:38:09 +0200
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     paulmck@...ux.vnet.ibm.com
Cc:     Pratyush Anand <panand@...hat.com>,
        김동현 <austinkernel.kim@...il.com>,
        john.stultz@...aro.org, Steven Rostedt <rostedt@...dmis.org>,
        linux-kernel@...r.kernel.org
Subject: Re: RCU stall when using function_graph

On 10/08/2017 23:39, Paul E. McKenney wrote:
> On Thu, Aug 10, 2017 at 11:45:09AM +0200, Daniel Lezcano wrote:

[ ... ]

>> Nothing coming in mind but may be worth to mention the slowness of the
>> CPU is the aggravating factor. In particular I was able to reproduce the
>> issue by setting to the min CPU frequency. With the ondemand governor,
>> we can have the frequency high (hence enough CPU power) at the moment we
>> set the function_graph because another CPU is loaded (and both CPUs are
>> sharing the same clock line). The system became stuck at the moment the
>> other CPU went idle with the lowest frequency. That introduced
>> randomness in the issue and made hard to figure out why the RCU stall
>> was happening.
> 
> Adding this, then?

Yes, sure.

Thanks Paul.

  -- Daniel

> ------------------------------------------------------------------------
> 
> commit f7d9ce95064f76be583c775fac32076fa59f1617
> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Date:   Thu Aug 10 14:33:17 2017 -0700
> 
>     documentation: Slow systems can stall RCU grace periods
>     
>     If a fast system has a worst-case grace-period duration of (say) ten
>     seconds, then running the same workload on a system ten times as slow
>     will get you an RCU CPU stall warning given default stall-warning
>     timeout settings.  This commit therefore adds this possibility to
>     stallwarn.txt.
>     
>     Reported-by: Daniel Lezcano <daniel.lezcano@...aro.org>
>     Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> 
> diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
> index 21b8913acbdf..238acbd94917 100644
> --- a/Documentation/RCU/stallwarn.txt
> +++ b/Documentation/RCU/stallwarn.txt
> @@ -70,6 +70,12 @@ o	A periodic interrupt whose handler takes longer than the time
>  	considerably longer than normal, which can in turn result in
>  	RCU CPU stall warnings.
>  
> +o	Testing a workload on a fast system, tuning the stall-warning
> +	timeout down to just barely avoid RCU CPU stall warnings, and then
> +	running the same workload with the same stall-warning timeout on a
> +	slow system.  Note that thermal throttling and on-demand governors
> +	can cause a single system to be sometimes fast and sometimes slow!
> +
>  o	A hardware or software issue shuts off the scheduler-clock
>  	interrupt on a CPU that is not in dyntick-idle mode.  This
>  	problem really has happened, and seems to be most likely to
> 


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog