lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 15 Sep 2016 17:34:11 +0300 From: Tariq Toukan <tariqt@...lanox.com> To: Alexei Starovoitov <alexei.starovoitov@...il.com>, Saeed Mahameed <saeedm@...lanox.com> CC: iovisor-dev <iovisor-dev@...ts.iovisor.org>, <netdev@...r.kernel.org>, Brenden Blanco <bblanco@...mgrid.com>, Tom Herbert <tom@...bertland.com>, Martin KaFai Lau <kafai@...com>, Jesper Dangaard Brouer <brouer@...hat.com>, Daniel Borkmann <daniel@...earbox.net>, Eric Dumazet <edumazet@...gle.com>, Jamal Hadi Salim <jhs@...atatu.com> Subject: Re: [PATCH RFC 01/11] net/mlx5e: Single flow order-0 pages for Striding RQ Hi Alexei, On 07/09/2016 8:31 PM, Alexei Starovoitov wrote: > On Wed, Sep 07, 2016 at 03:42:22PM +0300, Saeed Mahameed wrote: >> From: Tariq Toukan <tariqt@...lanox.com> >> >> To improve the memory consumption scheme, we omit the flow that >> demands and splits high-order pages in Striding RQ, and stay >> with a single Striding RQ flow that uses order-0 pages. >> >> Moving to fragmented memory allows the use of larger MPWQEs, >> which reduces the number of UMR posts and filler CQEs. >> >> Moving to a single flow allows several optimizations that improve >> performance, especially in production servers where we would >> anyway fallback to order-0 allocations: >> - inline functions that were called via function pointers. >> - improve the UMR post process. >> >> This patch alone is expected to give a slight performance reduction. >> However, the new memory scheme gives the possibility to use a page-cache >> of a fair size, that doesn't inflate the memory footprint, which will >> dramatically fix the reduction and even give a huge gain. >> >> We ran pktgen single-stream benchmarks, with iptables-raw-drop: >> >> Single stride, 64 bytes: >> * 4,739,057 - baseline >> * 4,749,550 - this patch >> no reduction >> >> Larger packets, no page cross, 1024 bytes: >> * 3,982,361 - baseline >> * 3,845,682 - this patch >> 3.5% reduction >> >> Larger packets, every 3rd packet crosses a page, 1500 bytes: >> * 3,731,189 - baseline >> * 3,579,414 - this patch >> 4% reduction > imo it's not a realistic use case, but would be good to mention that > patch 3 brings performance back for this use case anyway. Exactly, that's what I meant in the previous paragraph (".. will dramatically fix the reduction and even give a huge gain.") Regards, Tariq
Powered by blists - more mailing lists