netdev - Re: [PATCH v6 12/12] net/mlx4_en: add prefetch in xdp rx path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1467961005.17638.28.camel@edumazet-glaptop3.roam.corp.google.com>
Date:	Fri, 08 Jul 2016 08:56:45 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:	Brenden Blanco <bblanco@...mgrid.com>, davem@...emloft.net,
	netdev@...r.kernel.org, Martin KaFai Lau <kafai@...com>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	Ari Saha <as754m@....com>, Or Gerlitz <gerlitz.or@...il.com>,
	john.fastabend@...il.com, hannes@...essinduktion.org,
	Thomas Graf <tgraf@...g.ch>, Tom Herbert <tom@...bertland.com>,
	Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH v6 12/12] net/mlx4_en: add prefetch in xdp rx path

On Thu, 2016-07-07 at 21:16 -0700, Alexei Starovoitov wrote:

> I've tried this style of prefetching in the past for normal stack
> and it didn't help at all.

This is very nice, but my experience showed opposite numbers.
So I guess you did not choose the proper prefetch strategy.

prefetching in mlx4 gave me good results, once I made sure our compiler
was not moving the actual prefetch operations on x86_64 (ie forcing use
of asm volatile as in x86_32 instead of the builtin prefetch). You might
check if your compiler does the proper thing because this really hurt me
in the past.

In my case, I was using 40Gbit NIC, and prefetching 128 bytes instead of
64 bytes allowed to remove one stall in GRO engine when using TCP with
TS (total header size : 66 bytes), or tunnels.

The problem with prefetch is that it works well assuming a given rate
(in pps), and given cpus, as prefetch behavior is varying among flavors.

Brenden chose to prefetch N+3, based on some experiments, on some
hardware,

prefetch N+3 can actually slow down if you receive a moderate load,
which is the case 99% of the time in typical workloads on modern servers
with multi queue NIC.

This is why it was hard to upstream such changes, because they focus on
max throughput instead of low latencies.