lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20140928.174341.2056306997547610435.davem@davemloft.net>
Date:	Sun, 28 Sep 2014 17:43:41 -0400 (EDT)
From:	David Miller <davem@...emloft.net>
To:	eric.dumazet@...il.com
Cc:	therbert@...gle.com, brouer@...hat.com, netdev@...r.kernel.org,
	alexander.h.duyck@...el.com, toke@...e.dk, fw@...len.de,
	jhs@...atatu.com, dave.taht@...il.com, john.r.fastabend@...el.com,
	dborkman@...hat.com, hannes@...essinduktion.org
Subject: Re: [PATCH net-next] dql: dql_queued() should write first to
 reduce bus transactions

From: Eric Dumazet <eric.dumazet@...il.com>
Date: Thu, 25 Sep 2014 23:04:56 -0700

> From: Eric Dumazet <edumazet@...gle.com>
> 
> While doing high throughput test on a BQL enabled NIC,
> I found a very high cost in ndo_start_xmit() when accessing BQL data.
> 
> It turned out the problem was caused by compiler trying to be
> smart, but involving a bad MESI transaction :
> 
>   0.05 │  mov    0xc0(%rax),%edi    // LOAD dql->num_queued
>   0.48 │  mov    %edx,0xc8(%rax)    // STORE dql->last_obj_cnt = count
>  58.23 │  add    %edx,%edi
>   0.58 │  cmp    %edi,0xc4(%rax)
>   0.76 │  mov    %edi,0xc0(%rax)    // STORE dql->num_queued += count
>   0.72 │  js     bd8
> 
> 
> I got an incredible 10 % gain [1] by making sure cpu do not attempt
> to get the cache line in Shared mode, but directly requests for
> ownership.
> 
> New code :
> 	mov    %edx,0xc8(%rax)  // STORE dql->last_obj_cnt = count
> 	add    %edx,0xc0(%rax)  // RMW   dql->num_queued += count
> 	mov    0xc4(%rax),%ecx  // LOAD dql->adj_limit
> 	mov    0xc0(%rax),%edx  // LOAD dql->num_queued
> 	cmp    %edx,%ecx
> 
> The TX completion was running from another cpu, with high interrupts
> rate.
> 
> Note that I am using barrier() as a soft hint, as mb() here could be
> too heavy cost.
> 
> [1] This was a netperf TCP_STREAM with TSO disabled, but GSO enabled.
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>

Ok now you're just showing off :-)  Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ