linux-kernel - Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <396f7363-0f28-0fdb-7a48-ae16b0c8e82f@redhat.com>
Date:   Wed, 26 Sep 2018 12:30:36 -0400
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>, will.deacon@....com,
        mingo@...nel.org
Cc:     linux-kernel@...r.kernel.org, andrea.parri@...rulasolutions.com,
        tglx@...utronix.de
Subject: Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

On 09/26/2018 07:01 AM, Peter Zijlstra wrote:
> On x86 we cannot do fetch_or with a single instruction and end up
> using a cmpxchg loop, this reduces determinism. Replace the fetch_or
> with a very tricky composite xchg8 + load.
>
> The basic idea is that we use xchg8 to test-and-set the pending bit
> (when it is a byte) and then a load to fetch the whole word. Using
> two instructions of course opens a window we previously did not have.
> In particular the ordering between pending and tail is of interrest,
> because that is where the split happens.
>
> The claim is that if we order them, it all works out just fine. There
> are two specific cases where the pending,tail state changes:
>
>  - when the 3rd lock(er) comes in and finds pending set, it'll queue
>    and set tail; since we set tail while pending is set, the ordering
>    is split is not important (and not fundamentally different form
>    fetch_or). [*]

The split can cause some changes in behavior. The 3rd locker observes
the pending bit and set tail. The split load of the 2nd locker may make
it observe the tail and backout of the pending loop. As a result, the
2nd locker will acquire the lock after the third locker in this case.
That won't happen with the original code.

I am not saying this is a problem. It is just something we should take
note on.

Cheers,
Longman