[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160912175321.GA15135@ast-mbp.thefacebook.com>
Date: Mon, 12 Sep 2016 10:53:23 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Saeed Mahameed <saeedm@...lanox.com>,
iovisor-dev <iovisor-dev@...ts.iovisor.org>,
netdev@...r.kernel.org, Tariq Toukan <tariqt@...lanox.com>,
Brenden Blanco <bblanco@...mgrid.com>,
Tom Herbert <tom@...bertland.com>,
Martin KaFai Lau <kafai@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Eric Dumazet <edumazet@...gle.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Edward Cree <ecree@...arflare.com>
Subject: Re: README: [PATCH RFC 11/11] net/mlx5e: XDP TX xmit more
On Mon, Sep 12, 2016 at 10:56:55AM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 8 Sep 2016 23:30:50 -0700
> Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
>
> > On Fri, Sep 09, 2016 at 07:36:52AM +0200, Jesper Dangaard Brouer wrote:
> > > > > Lets do bundling/bulking from the start!
> > > >
> > > > mlx4 already does bulking and this proposed mlx5 set of patches
> > > > does bulking as well.
> > > > See nothing wrong about it. RX side processes the packets and
> > > > when it's done it tells TX to xmit whatever it collected.
> > >
> > > This is doing "hidden" bulking and not really taking advantage of using
> > > the icache more effeciently.
> > >
> > > Let me explain the problem I see, little more clear then, so you
> > > hopefully see where I'm going.
> > >
> > > Imagine you have packets intermixed towards the stack and XDP_TX.
> > > Every time you call the stack code, then you flush your icache. When
> > > returning to the driver code, you will have to reload all the icache
> > > associated with the XDP_TX, this is a costly operation.
> >
> > correct. And why is that a problem?
>
> It is good that you can see and acknowledge the I-cache problem.
>
> XDP is all about performance. What I hear is, that you are arguing
> against a model that will yield better performance, that does not make
> sense to me. Let me explain this again, in another way.
I'm arguing against your proposal that I think will be more complex and
lower performance than what Saeed and the team already implemented.
Therefore I don't think it's fair to block the patch and ask them to
reimplement it just to test an idea that may or may not improve performance.
Getting maximum performance is tricky. Good is better than perfect.
It's important to argue about user space visible bits upfront, but
on the kernel performance side we should build/test incrementally.
This particular patch 11/11 is simple, easy to review and provides
good performance. What's not to like?
Powered by blists - more mailing lists