netdev - Re: [RFC] use smp_load_acquire()/smp_store

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 29 Oct 2014 14:13:51 -0700
From:	Alexander Duyck <alexander.h.duyck@...hat.com>
To:	Eric Dumazet <eric.dumazet@...il.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>
CC:	netdev <netdev@...r.kernel.org>
Subject: Re: [RFC] use smp_load_acquire()/smp_store_release()


On 10/29/2014 12:57 PM, Eric Dumazet wrote:
> On Wed, 2014-10-29 at 12:27 -0700, Jeff Kirsher wrote:
>> On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
>>> On 10/29/2014 07:49 AM, Eric Dumazet wrote:
>>>> Hi Alexander
>>>>
>>>> The memory barriers added in commit
>>>> b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>>>> ("net: Add memory barriers to prevent possible race in byte queue
>>>> limits")
>>>>
>>>> have heavy cost.
>>>>
>>>> It seems we could use smp_load_acquire() and smp_store_release()
>>>> instead ?
>>>>
>>>> I'll post a patch later today. I would be interested if someone was able
>>>> to test it, as your commit apparently was tested and known to fix a
>>>> reproducible race.
>>>>
>>>> Thanks !
>> Eric- just CC me on the patch you post and I will see what I can do
>> about getting validation eyes on it.
> Thanks guys, will do, and will CC Paul as well.
>
> Alexander, here is the following profile showing the cost of the
> 'mfence', in a typical rpc workload (a lot of IRQ are generated for TX
> completions, because RPC tend to send small packets)
>
>    0.11 │       je     33a
>         │       mov    -0x3c(%rbp),%esi
>    0.06 │       lea    0xc0(%rbx),%rdi
>    0.06 │       callq  dql_completed
>    0.06 │       mfence
>   38.68 │       mov    0xc4(%rbx),%edx
>    1.83 │       mov    0xc0(%rbx),%eax
>         │       cmp    %eax,%edx
>    0.22 │       js     333
>    0.11 │       lock   btrl $0x1,0x98(%rbx)

It might be worthwhile to see if it would be possible to combine BQL 
with the mechanism the drivers have for handling descriptors/packets.  
Otherwise you are going to be pulling one barrier just to hit another 
right after it.

Also depending on what driver it is that the trace is from you may want 
to check and see if you have any MMIO transactions occurring right 
before you make the call, otherwise that may be the actual cause for the 
significant cost as you are having to flush non-coherent memory before 
you can resume operation.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html