[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170825144513.1ee9fbb1@redhat.com>
Date: Fri, 25 Aug 2017 14:45:13 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Michael Chan <michael.chan@...adcom.com>
Cc: Alexander Duyck <alexander.duyck@...il.com>,
"Duyck, Alexander H" <alexander.h.duyck@...el.com>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"pstaszewski@...are.pl" <pstaszewski@...are.pl>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"xdp-newbies@...r.kernel.org" <xdp-newbies@...r.kernel.org>,
"andy@...yhouse.net" <andy@...yhouse.net>,
"borkmann@...earbox.net" <borkmann@...earbox.net>,
brouer@...hat.com
Subject: Re: XDP redirect measurements, gotchas and tracepoints
On Thu, 24 Aug 2017 20:36:28 -0700
Michael Chan <michael.chan@...adcom.com> wrote:
> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
> <brouer@...hat.com> wrote:
> > On Tue, 22 Aug 2017 23:59:05 -0700
> > Michael Chan <michael.chan@...adcom.com> wrote:
> >
> >> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> >> <alexander.duyck@...il.com> wrote:
> >> > On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@...adcom.com> wrote:
> >> >>
> >> >> Right, but it's conceivable to add an API to "return" the buffer to
> >> >> the input device, right?
> >
> > Yes, I would really like to see an API like this.
> >
> >> >
> >> > You could, it is just added complexity. "just free the buffer" in
> >> > ixgbe usually just amounts to one atomic operation to decrement the
> >> > total page count since page recycling is already implemented in the
> >> > driver. You still would have to unmap the buffer regardless of if you
> >> > were recycling it or not so all you would save is 1.000015259 atomic
> >> > operations per packet. The fraction is because once every 64K uses we
> >> > have to bulk update the count on the page.
> >> >
> >>
> >> If the buffer is returned to the input device, the input device can
> >> keep the DMA mapping. All it needs to do is to dma_sync it back to
> >> the input device when the buffer is returned.
> >
> > Yes, exactly, return to the input device. I really think we should
> > work on a solution where we can keep the DMA mapping around. We have
> > an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> > page return call, to achieve this. (I imagine other arch's have a high
> > DMA overhead than Intel)
> >
> > I'm not sure how the API should look. The ixgbe recycle mechanism and
> > splitting the page (into two packets) actually complicates things, and
> > tie us into a page-refcnt based model. We could get around this by
> > each driver implementing a page-return-callback, that allow us to
> > return the page to the input device? Then, drivers implementing the
> > 1-packet-per-page can simply check/read the page-refcnt, and if it is
> > "1" DMA-sync and reuse it in the RX queue.
> >
>
> Yeah, based on Alex' description, it's not clear to me whether ixgbe
> redirecting to a non-intel NIC or vice versa will actually work. It
> sounds like the output device has to make some assumptions about how
> the page was allocated by the input device.
Yes, exactly. We are tied into a page refcnt based scheme.
Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
is also tied to the RX queue size, plus how fast the pages are returned.
This makes it very hard to tune. As I demonstrated, default ixgbe
settings does not work well with XDP_REDIRECT. I needed to increase
TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
10Mpps) so I also needed it increase RX-ring size. But perf is best if
RX-ring size is smaller, thus two contradicting tuning needed.
> With buffer return API,
> each driver can cleanly recycle or free its own buffers properly.
Yes, exactly. And RX-driver can implement a special memory model for
this queue. E.g. RX-driver can know this is a dedicated XDP RX-queue
which is never used for SKBs, thus opening for new RX memory models.
Another advantage of a return API. There is also an opportunity for
avoiding the DMA map on TX. As we need to know the from-device. Thus,
we can add a DMA API, where we can query if the two devices uses the
same DMA engine, and can reuse the same DMA address the RX-side already
knows.
> Let me discuss this further with Andy to see if we can come up with a
> good scheme.
Sound good, looking forward to hear what you come-up with :-)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists