linux-kernel - Re: [Question]: try to fix contention between expire_timers and try_to_del_timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e1cc02c5e7dfd4d6bec937b6dc97bfc7@codeaurora.org>
Date:   Thu, 27 Jul 2017 18:10:34 -0700
From:   Vikram Mulukutla <markivx@...eaurora.org>
To:     qiaozhou <qiaozhou@...micro.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <john.stultz@...aro.org>, sboyd@...eaurora.org,
        LKML <linux-kernel@...r.kernel.org>,
        Wang Wilbur <wilburwang@...micro.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Will Deacon <will.deacon@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel-owner@...r.kernel.org, sudeep.holla@....com
Subject: Re: [Question]: try to fix contention between expire_timers and
 try_to_del_timer_sync

cc: Sudeep Holla

On 2017-07-26 18:29, qiaozhou wrote:
> On 2017年07月26日 22:16, Thomas Gleixner wrote:
>> On Wed, 26 Jul 2017, qiaozhou wrote:
>> 
>> Cc'ed ARM folks.
>> 

<snip>

>> 
>> For that particular timer case we can clear base->running_timer w/o 
>> the
>> lock held (see patch below), but this kind of
>> 
>>       lock -> test -> unlock -> retry
>> 
>> loops are all over the place in the kernel, so this is going to hurt 
>> you
>> sooner than later in some other place.
> It's true. This is the way spinlock is used normally and widely in
> kernel. I'll also ask ARM experts whether we can do something to avoid
> or reduce the chance of such issue. ARMv8.1 has one single
> instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can
> improve and reduce the chance.

I think we should have this discussion now - I brought this up earlier 
[1]
and I promised a test case that I completely forgot about - but here it
is (attached). Essentially a Big CPU in an acquire-check-release loop
will have an unfair advantage over a little CPU concurrently attempting
to acquire the same lock, in spite of the ticket implementation. If the 
Big
CPU needs the little CPU to make forward progress : livelock.

We've run into the same loop construct in other spots in the kernel and
the reason that a real  symptom is so rare is that the retry-loop on the 
'Big'
CPU needs to be interrupted just once by say an IRQ/FIQ and the 
live-lock
is broken. If the entire retry loop is within an interrupt-disabled 
critical
section then the odds of live-locking are much higher.

An example of the problem on a previous kernel is here [2]. Changes to 
the
workqueue code since may have fixed this particular instance.

One solution was to use udelay(1) in such loops instead of cpu_relax(), 
but
that's not very 'relaxing'. I'm not sure if there's something we could 
do
within the ticket spin-lock implementation to deal with this.

Note that I ran my test on a 4.9 kernel so that didn't include any 
spinlock
implementation changes since then. The test schedules two threads, one 
on
a big CPU and one on a little CPU. The big CPU thread does the 
lock/unlock/retry
loop for a full 1 second with interrupts disabled, while the little CPU 
attempts
to acquire the same loop but enabling interrupts after every successful 
lock+unlock.
With unfairness, the little CPU may take upto 1 second (or several 
milliseconds at
the least) just to acquire the lock once. This varies depending on the 
IPC difference
and frequencies of the big and little ARM64 CPUs:

Big cpu frequency | Little cpu frequency | Max time taken by little to 
acquire lock
2GHz              | 1.5GHz               | 133 microseconds
2GHz              | 300MHz               | 734 milliseconds

Thanks,
Vikram

[1] - https://lkml.org/lkml/2016/11/17/934
[2] - https://goo.gl/uneFjt

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

View attachment "0001-measure-spinlock-fairness-across-differently-capable.patch" of type "text/x-diff" (7285 bytes)