lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 31 Jul 2017 19:20:12 +0800
From:   qiaozhou <qiaozhou@...micro.com>
To:     Vikram Mulukutla <markivx@...eaurora.org>,
        Will Deacon <will.deacon@....com>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <john.stultz@...aro.org>, <sboyd@...eaurora.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Wang Wilbur <wilburwang@...micro.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel-owner@...r.kernel.org>, <sudeep.holla@....com>
Subject: Re: [Question]: try to fix contention between expire_timers and
 try_to_del_timer_sync



On 2017年07月29日 03:09, Vikram Mulukutla wrote:
> On 2017-07-28 02:28, Will Deacon wrote:
>> On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
> 
> <snip>
> 
>>>
>>> I think we should have this discussion now - I brought this up 
>>> earlier [1]
>>> and I promised a test case that I completely forgot about - but here it
>>> is (attached). Essentially a Big CPU in an acquire-check-release loop
>>> will have an unfair advantage over a little CPU concurrently attempting
>>> to acquire the same lock, in spite of the ticket implementation. If 
>>> the Big
>>> CPU needs the little CPU to make forward progress : livelock.
>>>
> 
> <snip>
> 
>>>
>>> One solution was to use udelay(1) in such loops instead of 
>>> cpu_relax(), but
>>> that's not very 'relaxing'. I'm not sure if there's something we 
>>> could do
>>> within the ticket spin-lock implementation to deal with this.
>>
>> Does bodging cpu_relax to back-off to wfe after a while help? The event
>> stream will wake it up if nothing else does. Nasty patch below, but 
>> I'd be
>> interested to know whether or not it helps.
>>
>> Will
>>
The patch also helps a lot on my platform. (Though it does cause 
deadlock(related with udelay) in uart driver in early boot, and not sure 
it's uart driver issue. Just workaround it firstly)

Platform: 4 a53(832MHz) + 4 a73(1.8GHz)
Test condition #1:
     a. core2: a53, while loop (spinlock, spin_unlock)
     b. core7: a73, while loop (spinlock, spin_unlock, cpu_relax)

Test result: recording the lock acquire times(a53, a73), max lock 
acquired time(a53), in 20 seconds

Without cpu_relax bodging patch:
===============================================================
|a53 locked times | a73 locked times | a53 max locked time(us)|
==================|==================|========================|
                182|          38371616|               1,951,954|
                202|          38427652|               2,261,319|
                210|          38477427|              15,309,597|
                207|          38494479|               6,656,453|
                220|          38422283|               2,064,155|
===============================================================

With cpu_relax bodging patch:
===============================================================
|a53 locked times | a73 locked times | a53 max locked time(us)|
==================|==================|========================|
            1849898|          37799379|                 131,255|
            1574172|          38557653|                  38,410|
            1924777|          37831725|                  42,999|
            1477665|          38723741|                  52,087|
            1865793|          38007741|                 783,965|
===============================================================

Also add some workload to the whole system to check the result.
Test condition #2: based on #1
     c. core6: a73, 1.8GHz, run "while(1);" loop

With cpu_relax bodging patch:
===============================================================
|a53 locked times | a73 locked times | a53 max locked time(us)|
==================|==================|========================|
                 20|          42563981|               2,317,070|
                 10|          42652793|               4,210,944|
                  9|          42651075|               5,691,834|
                 28|          42652591|               4,539,555|
                 10|          42652801|               5,850,639|
===============================================================

Also hotplug out other cores.
Test condition #2: based on #1
     d. hotplug out core1/3/4/5/6, keep core0 for scheduling

With cpu_relax bodging patch:
===============================================================
|a53 locked times | a73 locked times | a53 max locked time(us)|
==================|==================|========================|
                447|          42652450|                 309,549|
                515|          42650382|                 337,661|
                415|          42646669|                 628,525|
                431|          42651137|                 365,862|
                464|          42648916|                 379,934|
===============================================================

The last two tests are the actual cases where the hard-lockup is 
triggered on my platform. So I gathered some data, and it shows that a53 
needs much longer time to acquire the lock.

All tests are done in android, black screen with USB cable attached. The 
data is not so pretty as Vikram's. It might be related with cpu 
topology, core numbers, CCI frequency etc. (I'll do another test with 
both a53 and a73 running at 1.2GHz, to check whether it's the core 
frequency which leads to the major difference.)

> This does seem to help. Here's some data after 5 runs with and without 
> the patch.
> 
> time = max time taken to acquire lock
> counter = number of times lock acquired
> 
> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
> Without the cpu_relax() bodging patch:
> =====================================================
> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
> ==========|==============|===========|==============|
>    117893us|       2349144|        2us|       6748236|
>    571260us|       2125651|        2us|       7643264|
>     19780us|       2392770|        2us|       5987203|
>     19948us|       2395413|        2us|       5977286|
>     19822us|       2429619|        2us|       5768252|
>     19888us|       2444940|        2us|       5675657|
> =====================================================
> 
> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
> With the cpu_relax() bodging patch:
> =====================================================
> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
> ==========|==============|===========|==============|
>         3us|       2737438|        2us|       6907147|
>         2us|       2742478|        2us|       6902241|
>       132us|       2745636|        2us|       6876485|
>         3us|       2744554|        2us|       6898048|
>         3us|       2741391|        2us|       6882901|
> ==================================================== >
> The patch also seems to have helped with fairness in general
> allowing more work to be done if the CPU frequencies are more
> closely matched (I don't know if this translates to real world
> performance - probably not). The counter values are higher
> with the patch.
> 
> time = max time taken to acquire lock
> counter = number of times lock acquired
> 
> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
> Without the cpu_relax() bodging patch:
> =====================================================
> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
> ==========|==============|===========|==============|
>         2us|       5240654|        1us|       5339009|
>         2us|       5287797|       97us|       5327073|
>         2us|       5237634|        1us|       5334694|
>         2us|       5236676|       88us|       5333582|
>        84us|       5285880|       84us|       5329489|
> =====================================================
> 
> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
> With the cpu_relax() bodging patch:
> =====================================================
> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
> ==========|==============|===========|==============|
>       140us|      10449121|        1us|      11154596|
>         1us|      10757081|        1us|      11479395|
>        83us|      10237109|        1us|      10902557|
>         2us|       9871101|        1us|      10514313|
>         2us|       9758763|        1us|      10391849|
> =====================================================
> 
> 
> Thanks,
> Vikram
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ