lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1467961005.17638.28.camel@edumazet-glaptop3.roam.corp.google.com>
Date:	Fri, 08 Jul 2016 08:56:45 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:	Brenden Blanco <bblanco@...mgrid.com>, davem@...emloft.net,
	netdev@...r.kernel.org, Martin KaFai Lau <kafai@...com>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	Ari Saha <as754m@....com>, Or Gerlitz <gerlitz.or@...il.com>,
	john.fastabend@...il.com, hannes@...essinduktion.org,
	Thomas Graf <tgraf@...g.ch>, Tom Herbert <tom@...bertland.com>,
	Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH v6 12/12] net/mlx4_en: add prefetch in xdp rx path

On Thu, 2016-07-07 at 21:16 -0700, Alexei Starovoitov wrote:

> I've tried this style of prefetching in the past for normal stack
> and it didn't help at all.

This is very nice, but my experience showed opposite numbers.
So I guess you did not choose the proper prefetch strategy.

prefetching in mlx4 gave me good results, once I made sure our compiler
was not moving the actual prefetch operations on x86_64 (ie forcing use
of asm volatile as in x86_32 instead of the builtin prefetch). You might
check if your compiler does the proper thing because this really hurt me
in the past.

In my case, I was using 40Gbit NIC, and prefetching 128 bytes instead of
64 bytes allowed to remove one stall in GRO engine when using TCP with
TS (total header size : 66 bytes), or tunnels.

The problem with prefetch is that it works well assuming a given rate
(in pps), and given cpus, as prefetch behavior is varying among flavors.

Brenden chose to prefetch N+3, based on some experiments, on some
hardware,

prefetch N+3 can actually slow down if you receive a moderate load,
which is the case 99% of the time in typical workloads on modern servers
with multi queue NIC.

This is why it was hard to upstream such changes, because they focus on
max throughput instead of low latencies.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ