[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v8lkzlch.fsf@toke.dk>
Date: Thu, 05 Jan 2023 23:07:42 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Andy Gospodarek <andrew.gospodarek@...adcom.com>
Cc: Tariq Toukan <ttoukan.linux@...il.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Jakub Kicinski <kuba@...nel.org>,
Andy Gospodarek <andrew.gospodarek@...adcom.com>,
ast@...nel.org, daniel@...earbox.net, davem@...emloft.net,
hawk@...nel.org, john.fastabend@...il.com, andrii@...nel.org,
kafai@...com, songliubraving@...com, yhs@...com,
kpsingh@...nel.org, lorenzo.bianconi@...hat.com,
netdev@...r.kernel.org, bpf@...r.kernel.org,
Jesper Dangaard Brouer <brouer@...hat.com>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, gal@...dia.com,
Saeed Mahameed <saeedm@...dia.com>, tariqt@...dia.com
Subject: Re: [PATCH net-next v2] samples/bpf: fixup some tools to be able to
support xdp multibuffer
Andy Gospodarek <andrew.gospodarek@...adcom.com> writes:
> On Thu, Jan 05, 2023 at 04:43:28PM +0100, Toke Høiland-Jørgensen wrote:
>> Tariq Toukan <ttoukan.linux@...il.com> writes:
>>
>> > On 04/01/2023 14:28, Toke Høiland-Jørgensen wrote:
>> >> Lorenzo Bianconi <lorenzo@...nel.org> writes:
>> >>
>> >>>> On Tue, 03 Jan 2023 16:19:49 +0100 Toke Høiland-Jørgensen wrote:
>> >>>>> Hmm, good question! I don't think we've ever explicitly documented any
>> >>>>> assumptions one way or the other. My own mental model has certainly
>> >>>>> always assumed the first frag would continue to be the same size as in
>> >>>>> non-multi-buf packets.
>> >>>>
>> >>>> Interesting! :) My mental model was closer to GRO by frags
>> >>>> so the linear part would have no data, just headers.
>> >>>
>> >>> That is assumption as well.
>> >>
>> >> Right, okay, so how many headers? Only Ethernet, or all the way up to
>> >> L4 (TCP/UDP)?
>> >>
>> >> I do seem to recall a discussion around the header/data split for TCP
>> >> specifically, but I think I mentally put that down as "something people
>> >> may way to do at some point in the future", which is why it hasn't made
>> >> it into my own mental model (yet?) :)
>> >>
>> >> -Toke
>> >>
>> >
>> > I don't think that all the different GRO layers assume having their
>> > headers/data in the linear part. IMO they will just perform better if
>> > these parts are already there. Otherwise, the GRO flow manages, and
>> > pulls the needed amount into the linear part.
>> > As examples, see calls to gro_pull_from_frag0 in net/core/gro.c, and the
>> > call to pskb_may_pull() from skb_gro_header_slow().
>> >
>> > This resembles the bpf_xdp_load_bytes() API used here in the xdp prog.
>>
>> Right, but that is kernel code; what we end up doing with the API here
>> affects how many programs need to make significant changes to work with
>> multibuf, and how many can just set the frags flag and continue working.
>> Which also has a performance impact, see below.
>>
>> > The context of my questions is that I'm looking for the right memory
>> > scheme for adding xdp-mb support to mlx5e striding RQ.
>> > In striding RQ, the RX buffer consists of "strides" of a fixed size set
>> > by pthe driver. An incoming packet is written to the buffer starting from
>> > the beginning of the next available stride, consuming as much strides as
>> > needed.
>> >
>> > Due to the need for headroom and tailroom, there's no easy way of
>> > building the xdp_buf in place (around the packet), so it should go to a
>> > side buffer.
>> >
>> > By using 0-length linear part in a side buffer, I can address two
>> > challenging issues: (1) save the in-driver headers memcpy (copy might
>> > still exist in the xdp program though), and (2) conform to the
>> > "fragments of the same size" requirement/assumption in xdp-mb.
>> > Otherwise, if we pull from frag[0] into the linear part, frag[0] becomes
>> > smaller than the next fragments.
>>
>> Right, I see.
>>
>> So my main concern would be that if we "allow" this, the only way to
>> write an interoperable XDP program will be to use bpf_xdp_load_bytes()
>> for every packet access. Which will be slower than DPA, so we may end up
>> inadvertently slowing down all of the XDP ecosystem, because no one is
>> going to bother with writing two versions of their programs. Whereas if
>> you can rely on packet headers always being in the linear part, you can
>> write a lot of the "look at headers and make a decision" type programs
>> using just DPA, and they'll work for multibuf as well.
>
> The question I would have is what is really the 'slow down' for
> bpf_xdp_load_bytes() vs DPA? I know you and Jesper can tell me how many
> instructions each use. :)
I can try running some benchmarks to compare the two, sure!
-Toke
Powered by blists - more mailing lists