lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 1 Aug 2017 15:37:01 +0800
From:   qiaozhou <qiaozhou@...micro.com>
To:     Vikram Mulukutla <markivx@...eaurora.org>,
        Will Deacon <will.deacon@....com>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <john.stultz@...aro.org>, <sboyd@...eaurora.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Wang Wilbur <wilburwang@...micro.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel-owner@...r.kernel.org>, <sudeep.holla@....com>,
        Zhou Qiao <qiaozhou@...micro.com>
Subject: Re: [Question]: try to fix contention between expire_timers and
 try_to_del_timer_sync



On 2017年07月31日 19:20, qiaozhou wrote:
> 
> 
> On 2017年07月29日 03:09, Vikram Mulukutla wrote:
>> On 2017-07-28 02:28, Will Deacon wrote:
>>> On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
>>
>> <snip>
>>>
>>> Does bodging cpu_relax to back-off to wfe after a while help? The event
>>> stream will wake it up if nothing else does. Nasty patch below, but 
>>> I'd be
>>> interested to know whether or not it helps.
>>>
>>> Will
>>>
> The patch also helps a lot on my platform. (Though it does cause 
> deadlock(related with udelay) in uart driver in early boot, and not sure 
> it's uart driver issue. Just workaround it firstly)
> 
> Platform: 4 a53(832MHz) + 4 a73(1.8GHz)
> Test condition #1:
>      a. core2: a53, while loop (spinlock, spin_unlock)
>      b. core7: a73, while loop (spinlock, spin_unlock, cpu_relax)
> 
> Test result: recording the lock acquire times(a53, a73), max lock 
> acquired time(a53), in 20 seconds
> 
> Without cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>                 182|          38371616|               1,951,954|
>                 202|          38427652|               2,261,319|
>                 210|          38477427|              15,309,597|
>                 207|          38494479|               6,656,453|
>                 220|          38422283|               2,064,155|
> ===============================================================
> 
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>             1849898|          37799379|                 131,255|
>             1574172|          38557653|                  38,410|
>             1924777|          37831725|                  42,999|
>             1477665|          38723741|                  52,087|
>             1865793|          38007741|                 783,965|
> ===============================================================
> 
> Also add some workload to the whole system to check the result.
> Test condition #2: based on #1
>      c. core6: a73, 1.8GHz, run "while(1);" loop
> 
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>                  20|          42563981|               2,317,070|
>                  10|          42652793|               4,210,944|
>                   9|          42651075|               5,691,834|
>                  28|          42652591|               4,539,555|
>                  10|          42652801|               5,850,639|
> ===============================================================
> 
> Also hotplug out other cores.
> Test condition #3: based on #1
>      d. hotplug out core1/3/4/5/6, keep core0 for scheduling
> 
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>                 447|          42652450|                 309,549|
>                 515|          42650382|                 337,661|
>                 415|          42646669|                 628,525|
>                 431|          42651137|                 365,862|
>                 464|          42648916|                 379,934|
> ===============================================================
> 
> The last two tests are the actual cases where the hard-lockup is 
> triggered on my platform. So I gathered some data, and it shows that a53 
> needs much longer time to acquire the lock.
> 
> All tests are done in android, black screen with USB cable attached. The 
> data is not so pretty as Vikram's. It might be related with cpu 
> topology, core numbers, CCI frequency etc. (I'll do another test with 
> both a53 and a73 running at 1.2GHz, to check whether it's the core 
> frequency which leads to the major difference.)
> 
Test the contention with the same frequency between a53 and a73 cores.
Platform: 4 a53(1248MHz) + 4 a73(1248MHz)
Test condition #4:
      a. core2: a53, while loop (spinlock, spin_unlock)
      b. core7: a73, while loop (spinlock, spin_unlock)
===============================================================
|a53 locked times | a73 locked times | a53 max locked time(us)|
==================|==================|========================|
           12945632|          13021576|                      14|
           12934181|          13059230|                      16|
           12987186|          13059016|                      49|
           12958583|          13038884|                      24|
           14637546|          14672522|                      14|
===============================================================

The locked times are almost the same, and the max time of acquiring the 
lock on a53 also drops. On my platform, core frequency seems to be the 
key factor.

>> This does seem to help. Here's some data after 5 runs with and without 
>> the patch.
>>
>> time = max time taken to acquire lock
>> counter = number of times lock acquired
>>
>> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
>> Without the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>    117893us|       2349144|        2us|       6748236|
>>    571260us|       2125651|        2us|       7643264|
>>     19780us|       2392770|        2us|       5987203|
>>     19948us|       2395413|        2us|       5977286|
>>     19822us|       2429619|        2us|       5768252|
>>     19888us|       2444940|        2us|       5675657|
>> =====================================================
>>
>> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
>> With the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>         3us|       2737438|        2us|       6907147|
>>         2us|       2742478|        2us|       6902241|
>>       132us|       2745636|        2us|       6876485|
>>         3us|       2744554|        2us|       6898048|
>>         3us|       2741391|        2us|       6882901|
>> ==================================================== >
>> The patch also seems to have helped with fairness in general
>> allowing more work to be done if the CPU frequencies are more
>> closely matched (I don't know if this translates to real world
>> performance - probably not). The counter values are higher
>> with the patch.
>>
>> time = max time taken to acquire lock
>> counter = number of times lock acquired
>>
>> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
>> Without the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>         2us|       5240654|        1us|       5339009|
>>         2us|       5287797|       97us|       5327073|
>>         2us|       5237634|        1us|       5334694|
>>         2us|       5236676|       88us|       5333582|
>>        84us|       5285880|       84us|       5329489|
>> =====================================================
>>
>> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
>> With the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>       140us|      10449121|        1us|      11154596|
>>         1us|      10757081|        1us|      11479395|
>>        83us|      10237109|        1us|      10902557|
>>         2us|       9871101|        1us|      10514313|
>>         2us|       9758763|        1us|      10391849|
>> =====================================================
>>Also apply Vikram's patch and have a test.

cpu2: a53, 832MHz, cpu7: a73, 1.75Hz
Without cpu_relax bodging patch
=====================================================
cpu2 time | cpu2 counter | cpu7 time | cpu7 counter |
==========|==============|===========|==============|
      16505|          5243|          2|      12487322|
      16494|          5619|          1|      12013291|
      16498|          5276|          2|      11706824|
      16494|          7123|          1|      12532355|
      16470|          7208|          2|      11784617|
=====================================================

cpu2: a53, 832MHz, cpu7: a73, 1.75Hz
With cpu_relax bodging patch:
=====================================================
cpu2 time | cpu2 counter | cpu7 time | cpu7 counter |
==========|==============|===========|==============|
       3991|        140714|          1|      11430528|
       4018|        144371|          1|      11430528|
       4034|        143250|          1|      11427011|
       4330|        147345|          1|      11423583|
       4752|        138273|          1|      11433241|
=====================================================

It has some improvements, but not so good as Vikram's data. The big core 
still has much more chance to acquire lock.
>>
>> Thanks,
>> Vikram
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ