lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 28 Aug 2017 16:12:01 -0700
From:   Vikram Mulukutla <markivx@...eaurora.org>
To:     Will Deacon <will.deacon@....com>
Cc:     qiaozhou <qiaozhou@...micro.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <john.stultz@...aro.org>, sboyd@...eaurora.org,
        LKML <linux-kernel@...r.kernel.org>,
        Wang Wilbur <wilburwang@...micro.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel-owner@...r.kernel.org, sudeep.holla@....com
Subject: Re: [Question]: try to fix contention between expire_timers and
 try_to_del_timer_sync

Hi Will,

On 2017-08-25 12:48, Vikram Mulukutla wrote:
> Hi Will,
> 
> On 2017-08-15 11:40, Will Deacon wrote:
>> Hi Vikram,
>> 
>> On Thu, Aug 03, 2017 at 04:25:12PM -0700, Vikram Mulukutla wrote:
>>> On 2017-07-31 06:13, Will Deacon wrote:
>>> >On Fri, Jul 28, 2017 at 12:09:38PM -0700, Vikram Mulukutla wrote:
>>> >>On 2017-07-28 02:28, Will Deacon wrote:
>>> >>>On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
>>> 
>>> >>>
>>> >>This does seem to help. Here's some data after 5 runs with and without
>>> >>the
>>> >>patch.
>>> >
>>> >Blimey, that does seem to make a difference. Shame it's so ugly! Would you
>>> >be able to experiment with other values for CPU_RELAX_WFE_THRESHOLD? I had
>>> >it set to 10000 in the diff I posted, but that might be higher than
>>> >optimal.
>>> >It would be interested to see if it correlates with num_possible_cpus()
>>> >for the highly contended case.
>>> >
>>> >Will
>>> 
>>> Sorry for the late response - I should hopefully have some more data 
>>> with
>>> different thresholds before the week is finished or on Monday.
>> 
>> Did you get anywhere with the threshold heuristic?
>> 
>> Will
> 
> Here's some data from experiments that I finally got to today. I 
> decided
> to recompile for every value of the threshold. Was doing a binary 
> search
> of sorts and then started reducing by orders of magnitude. There pairs
> of rows here:
> 

Well here's something interesting. I tried a different platform and 
found that
the workaround doesn't help much at all, similar to Qiao's observation 
on his b.L
chipset. Something to do with the WFE implementation or event-stream?

I modified your patch to use a __delay(1) in place of the WFEs and this 
was
the result (still with the 10k threshold). The worst-case lock time for 
cpu0
drastically improves. Given that cpu0 re-enables interrupts between each 
lock
attempt in my test case, I think the lock count matters less here.

cpu_relax() patch with WFEs (original workaround):
(pairs of rows, first row is with c0 at 300Mhz, second
with c0 at 1.9GHz. Both rows have cpu4 at 2.3GHz max time
is in microseconds)
------------------------------------------------------|
c0 max time| c0 lock count| c4 max time| c4 lock count|
------------------------------------------------------|
      999843|            25|           2|      12988498| -> c0/cpu0 at 
300Mhz
           0|       8421132|           1|       9152979| -> c0/cpu0 at 
1.9GHz
------------------------------------------------------|
      999860|           160|           2|      12963487|
           1|       8418492|           1|       9158001|
------------------------------------------------------|
      999381|           734|           2|      12988636|
           1|       8387562|           1|       9128056|
------------------------------------------------------|
      989800|           750|           3|      12996473|
           1|       8389091|           1|       9112444|
------------------------------------------------------|

cpu_relax() patch with __delay(1):
(pairs of rows, first row is with c0 at 300Mhz, second
with c0 at 1.9GHz. Both rows have cpu4 at 2.3GHz. max time
is in microseconds)
------------------------------------------------------|
c0 max time| c0 lock count| c4 max time| c4 lock count|
------------------------------------------------------|
        7703|         1532|            2|      13035203| -> c0/cpu0 at 
300Mhz
           1|      8511686|            1|       8550411| -> c0/cpu0 at 
1.9GHz
------------------------------------------------------|
        7801|         1561|            2|      13040188|
           1|      8553985|            1|       8609853|
------------------------------------------------------|
        3953|         1576|            2|      13049991|
           1|      8576370|            1|       8611533|
------------------------------------------------------|
        3953|         1557|            2|      13030553|
           1|      8509020|            1|       8543883|
------------------------------------------------------|

I should also note that my earlier kernel was 4.9-stable based
and the one above was on a 4.4-stable based kernel.

Thanks,
Vikram

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ