lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 27 Jan 2016 13:56:03 -0800
From:	Alexei Starovoitov <alexei.starovoitov@...il.com>
To:	Jesper Dangaard Brouer <brouer@...hat.com>
Cc:	John Fastabend <john.fastabend@...il.com>,
	Tom Herbert <tom@...bertland.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Or Gerlitz <gerlitz.or@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	Alexander Duyck <alexander.duyck@...il.com>,
	Daniel Borkmann <borkmann@...earbox.net>,
	Marek Majkowski <marek@...udflare.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Florian Westphal <fw@...len.de>,
	Paolo Abeni <pabeni@...hat.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Amir Vadai <amirva@...il.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	Vladislav Yasevich <vyasevich@...il.com>
Subject: Re: Bypass at packet-page level (Was: Optimizing instruction-cache,
 more packets at each stage)

On Wed, Jan 27, 2016 at 09:47:50PM +0100, Jesper Dangaard Brouer wrote:
>  Sum: 18.75 % => calc: 30.0 ns (sum: 30.0 ns) => Total: 159.9 ns
> 
> To get around the cache-miss in eth_type_trans(), I created a
> "icache-loop" in mlx5e_poll_rx_cq() and pull all RX-ring packets "out",
> before calling eth_type_trans(), reducing cost to 2.45%.
> 
> To mitigate the SLUB slowpath, I used my slab + SKB-napi bulk API .  And
> also tuned SLUB (with slub_nomerge slub_min_objects=128) to get bigger
> slab-pages, thus bigger bulk opportunities.
> 
> This helped a lot, I can now drop 12Mpps (12,088,767 => 82.7 ns).

great stuff. I think such batching loop will reduce the cost of
eth_type_trans() for all use cases.
Only unfortunate that it would need to be implemented in every driver,
but there is only a handful that people care about in high performance
setups, so I think it's worth getting this patch in for mlx5 and
the other drivers will catch up.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ