lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 7 Jul 2016 21:16:14 -0700
From:	Alexei Starovoitov <alexei.starovoitov@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Brenden Blanco <bblanco@...mgrid.com>, davem@...emloft.net,
	netdev@...r.kernel.org, Martin KaFai Lau <kafai@...com>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	Ari Saha <as754m@....com>, Or Gerlitz <gerlitz.or@...il.com>,
	john.fastabend@...il.com, hannes@...essinduktion.org,
	Thomas Graf <tgraf@...g.ch>, Tom Herbert <tom@...bertland.com>,
	Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH v6 12/12] net/mlx4_en: add prefetch in xdp rx path

On Fri, Jul 08, 2016 at 05:56:31AM +0200, Eric Dumazet wrote:
> On Thu, 2016-07-07 at 19:15 -0700, Brenden Blanco wrote:
> > XDP programs read and/or write packet data very early, and cache miss is
> > seen to be a bottleneck.
> > 
> > Add prefetch logic in the xdp case 3 packets in the future. Throughput
> > improved from 10Mpps to 12.5Mpps.  LLC misses as reported by perf stat
> > reduced from ~14% to ~7%.  Prefetch values of 0 through 5 were compared
> > with >3 showing dimishing returns.
> 
> This is what I feared with XDP.
> 
> Instead of making generic changes in the driver(s), we now have 'patches
> that improve XDP numbers'
> 
> Careful prefetches make sense in NIC drivers, regardless of XDP being
> used or not.
> 
> On mlx4, prefetching next cqe could probably help as well.

I've tried this style of prefetching in the past for normal stack
and it didn't help at all.
It helps XDP because inner processing loop is short with small number
of memory accesses, so prefetching Nth packet in advance helps.
Prefetching next packet doesn't help as much, since bpf prog is
too short and hw prefetch logic doesn't have time to actually pull
the data in.
The effectiveness of this patch depends on size of the bpf program
and ammount of work it does. For small and medium sizes it works well.
For large prorgrams probably not so much, but we didn't get to
this point yet. I think eventually the prefetch distance should
be calculated dynamically based on size of prog and amount of work
it does or configured via knob (which would be unfortunate).
The performance gain is sizable, so I think it makes sense to
keep it... even to demonstrate the prefetch logic.
Also note this is ddio capable cpu. On desktop class cpus
the prefetch is mandatory for all bpf programs to have good performance.

Another alternative we considered is to allow bpf programs to
indicate to xdp infra how much in advance to prefetch, so
xdp side will prefetch only when program gives a hint.
But that would be the job of future patches.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ