[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2450453-0a42-0c68-db9b-3d097dc62c34@asrmicro.com>
Date: Tue, 1 Aug 2017 15:37:01 +0800
From: qiaozhou <qiaozhou@...micro.com>
To: Vikram Mulukutla <markivx@...eaurora.org>,
Will Deacon <will.deacon@....com>
CC: Thomas Gleixner <tglx@...utronix.de>,
John Stultz <john.stultz@...aro.org>, <sboyd@...eaurora.org>,
LKML <linux-kernel@...r.kernel.org>,
Wang Wilbur <wilburwang@...micro.com>,
Marc Zyngier <marc.zyngier@....com>,
Peter Zijlstra <peterz@...radead.org>,
<linux-kernel-owner@...r.kernel.org>, <sudeep.holla@....com>,
Zhou Qiao <qiaozhou@...micro.com>
Subject: Re: [Question]: try to fix contention between expire_timers and
try_to_del_timer_sync
On 2017年07月31日 19:20, qiaozhou wrote:
>
>
> On 2017年07月29日 03:09, Vikram Mulukutla wrote:
>> On 2017-07-28 02:28, Will Deacon wrote:
>>> On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
>>
>> <snip>
>>>
>>> Does bodging cpu_relax to back-off to wfe after a while help? The event
>>> stream will wake it up if nothing else does. Nasty patch below, but
>>> I'd be
>>> interested to know whether or not it helps.
>>>
>>> Will
>>>
> The patch also helps a lot on my platform. (Though it does cause
> deadlock(related with udelay) in uart driver in early boot, and not sure
> it's uart driver issue. Just workaround it firstly)
>
> Platform: 4 a53(832MHz) + 4 a73(1.8GHz)
> Test condition #1:
> a. core2: a53, while loop (spinlock, spin_unlock)
> b. core7: a73, while loop (spinlock, spin_unlock, cpu_relax)
>
> Test result: recording the lock acquire times(a53, a73), max lock
> acquired time(a53), in 20 seconds
>
> Without cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
> 182| 38371616| 1,951,954|
> 202| 38427652| 2,261,319|
> 210| 38477427| 15,309,597|
> 207| 38494479| 6,656,453|
> 220| 38422283| 2,064,155|
> ===============================================================
>
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
> 1849898| 37799379| 131,255|
> 1574172| 38557653| 38,410|
> 1924777| 37831725| 42,999|
> 1477665| 38723741| 52,087|
> 1865793| 38007741| 783,965|
> ===============================================================
>
> Also add some workload to the whole system to check the result.
> Test condition #2: based on #1
> c. core6: a73, 1.8GHz, run "while(1);" loop
>
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
> 20| 42563981| 2,317,070|
> 10| 42652793| 4,210,944|
> 9| 42651075| 5,691,834|
> 28| 42652591| 4,539,555|
> 10| 42652801| 5,850,639|
> ===============================================================
>
> Also hotplug out other cores.
> Test condition #3: based on #1
> d. hotplug out core1/3/4/5/6, keep core0 for scheduling
>
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
> 447| 42652450| 309,549|
> 515| 42650382| 337,661|
> 415| 42646669| 628,525|
> 431| 42651137| 365,862|
> 464| 42648916| 379,934|
> ===============================================================
>
> The last two tests are the actual cases where the hard-lockup is
> triggered on my platform. So I gathered some data, and it shows that a53
> needs much longer time to acquire the lock.
>
> All tests are done in android, black screen with USB cable attached. The
> data is not so pretty as Vikram's. It might be related with cpu
> topology, core numbers, CCI frequency etc. (I'll do another test with
> both a53 and a73 running at 1.2GHz, to check whether it's the core
> frequency which leads to the major difference.)
>
Test the contention with the same frequency between a53 and a73 cores.
Platform: 4 a53(1248MHz) + 4 a73(1248MHz)
Test condition #4:
a. core2: a53, while loop (spinlock, spin_unlock)
b. core7: a73, while loop (spinlock, spin_unlock)
===============================================================
|a53 locked times | a73 locked times | a53 max locked time(us)|
==================|==================|========================|
12945632| 13021576| 14|
12934181| 13059230| 16|
12987186| 13059016| 49|
12958583| 13038884| 24|
14637546| 14672522| 14|
===============================================================
The locked times are almost the same, and the max time of acquiring the
lock on a53 also drops. On my platform, core frequency seems to be the
key factor.
>> This does seem to help. Here's some data after 5 runs with and without
>> the patch.
>>
>> time = max time taken to acquire lock
>> counter = number of times lock acquired
>>
>> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
>> Without the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>> 117893us| 2349144| 2us| 6748236|
>> 571260us| 2125651| 2us| 7643264|
>> 19780us| 2392770| 2us| 5987203|
>> 19948us| 2395413| 2us| 5977286|
>> 19822us| 2429619| 2us| 5768252|
>> 19888us| 2444940| 2us| 5675657|
>> =====================================================
>>
>> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
>> With the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>> 3us| 2737438| 2us| 6907147|
>> 2us| 2742478| 2us| 6902241|
>> 132us| 2745636| 2us| 6876485|
>> 3us| 2744554| 2us| 6898048|
>> 3us| 2741391| 2us| 6882901|
>> ==================================================== >
>> The patch also seems to have helped with fairness in general
>> allowing more work to be done if the CPU frequencies are more
>> closely matched (I don't know if this translates to real world
>> performance - probably not). The counter values are higher
>> with the patch.
>>
>> time = max time taken to acquire lock
>> counter = number of times lock acquired
>>
>> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
>> Without the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>> 2us| 5240654| 1us| 5339009|
>> 2us| 5287797| 97us| 5327073|
>> 2us| 5237634| 1us| 5334694|
>> 2us| 5236676| 88us| 5333582|
>> 84us| 5285880| 84us| 5329489|
>> =====================================================
>>
>> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
>> With the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>> 140us| 10449121| 1us| 11154596|
>> 1us| 10757081| 1us| 11479395|
>> 83us| 10237109| 1us| 10902557|
>> 2us| 9871101| 1us| 10514313|
>> 2us| 9758763| 1us| 10391849|
>> =====================================================
>>Also apply Vikram's patch and have a test.
cpu2: a53, 832MHz, cpu7: a73, 1.75Hz
Without cpu_relax bodging patch
=====================================================
cpu2 time | cpu2 counter | cpu7 time | cpu7 counter |
==========|==============|===========|==============|
16505| 5243| 2| 12487322|
16494| 5619| 1| 12013291|
16498| 5276| 2| 11706824|
16494| 7123| 1| 12532355|
16470| 7208| 2| 11784617|
=====================================================
cpu2: a53, 832MHz, cpu7: a73, 1.75Hz
With cpu_relax bodging patch:
=====================================================
cpu2 time | cpu2 counter | cpu7 time | cpu7 counter |
==========|==============|===========|==============|
3991| 140714| 1| 11430528|
4018| 144371| 1| 11430528|
4034| 143250| 1| 11427011|
4330| 147345| 1| 11423583|
4752| 138273| 1| 11433241|
=====================================================
It has some improvements, but not so good as Vikram's data. The big core
still has much more chance to acquire lock.
>>
>> Thanks,
>> Vikram
>>
Powered by blists - more mailing lists