lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d88110f-cf0e-c72e-7acc-518b736e715e@colorfullife.com>
Date:   Mon, 5 Sep 2016 20:57:19 +0200
From:   Manfred Spraul <manfred@...orfullife.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will.deacon@....com>, benh@...nel.crashing.org,
        paulmck@...ux.vnet.ibm.com
Cc:     Ingo Molnar <mingo@...e.hu>, Boqun Feng <boqun.feng@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, 1vier1@....de,
        Davidlohr Bueso <dave@...olabs.net>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        netfilter-devel@...r.kernel.org
Subject: Re: [PATCH 8/7] net/netfilter/nf_conntrack_core: Remove another
 memory barrier

Hi Peter,

On 09/02/2016 09:22 PM, Peter Zijlstra wrote:
> On Fri, Sep 02, 2016 at 08:35:55AM +0200, Manfred Spraul wrote:
>> On 09/01/2016 06:41 PM, Peter Zijlstra wrote:
>>> On Thu, Sep 01, 2016 at 04:30:39PM +0100, Will Deacon wrote:
>>>> On Thu, Sep 01, 2016 at 05:27:52PM +0200, Manfred Spraul wrote:
>>>>> Since spin_unlock_wait() is defined as equivalent to spin_lock();
>>>>> spin_unlock(), the memory barrier before spin_unlock_wait() is
>>>>> also not required.
>>> Note that ACQUIRE+RELEASE isn't a barrier.
>>>
>>> Both are semi-permeable and things can cross in the middle, like:
>>>
>>>
>>> 	x = 1;
>>> 	LOCK
>>> 	UNLOCK
>>> 	r = y;
>>>
>>> can (validly) get re-ordered like:
>>>
>>> 	LOCK
>>> 	r = y;
>>> 	x = 1;
>>> 	UNLOCK
>>>
>>> So if you want things ordered, as I think you do, I think the smp_mb()
>>> is still needed.
>> CPU1:
>> x=1; /* without WRITE_ONCE */
>> LOCK(l);
>> UNLOCK(l);
>> <do_semop>
>> smp_store_release(x,0)
>>
>>
>> CPU2;
>> LOCK(l)
>> if (smp_load_acquire(x)==1) goto slow_path
>> <do_semop>
>> UNLOCK(l)
>>
>> Ordering is enforced because both CPUs access the same lock.
>>
>> x=1 can't be reordered past the UNLOCK(l), I don't see that further
>> guarantees are necessary.
>>
>> Correct?
> Correct, sadly implementations do not comply :/ In fact, even x86 is
> broken here.
>
> I spoke to Will earlier today and he suggests either making
> spin_unlock_wait() stronger to avoids any and all such surprises or just
> getting rid of the thing.
I've tried the trivial solution:
Replace spin_unlock_wait() with spin_lock();spin_unlock().
With sem-scalebench, I get around a factor 2 slowdown with an array with 
16 semaphores and factor 13 slowdown with an array of 256 semaphores :-(
[with LOCKDEP+DEBUG_SPINLOCK].

Anyone around with a ppc or arm? How slow is the loop of the 
spin_unlock_wait() calls?
Single CPU is sufficient.

Question 1: How large is the difference between:
#./sem-scalebench -t 10 -c 1 -p 1 -o 4 -f -d 1
#./sem-scalebench -t 10 -c 1 -p 1 -o 4 -f -d 256
https://github.com/manfred-colorfu/ipcscale

For x86, the difference is only ~30%.

Question 2:
Is it faster if the attached patch is applied? (relative to mmots)

--
     Manfred

View attachment "0001-ipc-sem.c-Avoid-spin_unlock_wait.patch" of type "text/x-patch" (3559 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ