netdev - Re: order-0 vs order-N driver allocation. Was: [PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160804181913.26ee17b9@redhat.com>
Date:	Thu, 4 Aug 2016 18:19:13 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Brenden Blanco <bblanco@...mgrid.com>, davem@...emloft.net,
	netdev@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>,
	Saeed Mahameed <saeedm@....mellanox.co.il>,
	Martin KaFai Lau <kafai@...com>, Ari Saha <as754m@....com>,
	Or Gerlitz <gerlitz.or@...il.com>, john.fastabend@...il.com,
	hannes@...essinduktion.org, Thomas Graf <tgraf@...g.ch>,
	Tom Herbert <tom@...bertland.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	Tariq Toukan <ttoukan.linux@...il.com>, brouer@...hat.com,
	Mel Gorman <mgorman@...hsingularity.net>,
	linux-mm <linux-mm@...ck.org>
Subject: Re: order-0 vs order-N driver allocation. Was: [PATCH v10 07/12]
 net/mlx4_en: add page recycle to prepare rx ring for tx support


On Wed, 3 Aug 2016 10:45:13 -0700 Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:

> On Mon, Jul 25, 2016 at 09:35:20AM +0200, Eric Dumazet wrote:
> > On Tue, 2016-07-19 at 12:16 -0700, Brenden Blanco wrote:  
> > > The mlx4 driver by default allocates order-3 pages for the ring to
> > > consume in multiple fragments. When the device has an xdp program, this
> > > behavior will prevent tx actions since the page must be re-mapped in
> > > TODEVICE mode, which cannot be done if the page is still shared.
> > > 
> > > Start by making the allocator configurable based on whether xdp is
> > > running, such that order-0 pages are always used and never shared.
> > > 
> > > Since this will stress the page allocator, add a simple page cache to
> > > each rx ring. Pages in the cache are left dma-mapped, and in drop-only
> > > stress tests the page allocator is eliminated from the perf report.
> > > 
> > > Note that setting an xdp program will now require the rings to be
> > > reconfigured.  
> > 
> > Again, this has nothing to do with XDP ?
> > 
> > Please submit a separate patch, switching this driver to order-0
> > allocations.
> > 
> > I mentioned this order-3 vs order-0 issue earlier [1], and proposed to
> > send a generic patch, but had been traveling lately, and currently in
> > vacation.
> > 
> > order-3 pages are problematic when dealing with hostile traffic anyway,
> > so we should exclusively use order-0 pages, and page recycling like
> > Intel drivers.
> > 
> > http://lists.openwall.net/netdev/2016/04/11/88  
> 
> Completely agree. These multi-page tricks work only for benchmarks and
> not for production.
> Eric, if you can submit that patch for mlx4 that would be awesome.
> 
> I think we should default to order-0 for both mlx4 and mlx5.
> Alternatively we're thinking to do a netlink or ethtool switch to
> preserve old behavior, but frankly I don't see who needs this order-N
> allocation schemes.

I actually agree, that we should switch to order-0 allocations.

*BUT* this will cause performance regressions on platforms with
expensive DMA operations (as they no longer amortize the cost of
mapping a larger page).

Plus, the base cost of order-0 page is 246 cycles (see [1] slide#9),
and the 10G wirespeed target is approx 201 cycles.  Thus, for these
speeds some page recycling tricks are needed.  I described how the Intel
drives does a cool trick in [1] slide#14, but it does not address the
DMA part and costs some extra atomic ops.

I've started coding on the page-pool last week, which address both the
DMA mapping and recycling (with less atomic ops). (p.s. still on
vacation this week).

http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer