netdev - Re: [PATCH bpf-next v3 1/3] xsk: Support UMEM chunk_size > PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ8uoz3WTCtqtU+EwHqDaKZfBrn=bVbNDhYYqNWahCo24ZHYZg@mail.gmail.com>
Date:   Fri, 21 Apr 2023 14:27:09 +0200
From:   Magnus Karlsson <magnus.karlsson@...il.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Kal Cutter Conley <kal.conley@...tris.com>,
        Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        netdev@...r.kernel.org, bpf@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 1/3] xsk: Support UMEM chunk_size > PAGE_SIZE

On Fri, 21 Apr 2023 at 12:01, Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Maciej Fijalkowski <maciej.fijalkowski@...el.com> writes:
>
> > On Tue, Apr 18, 2023 at 01:12:00PM +0200, Kal Cutter Conley wrote:
> >
> > Hi there,
> >
> >> > >> In addition, presumably when using this mode, the other XDP actions
> >> > >> (XDP_PASS, XDP_REDIRECT to other targets) would stop working unless we
> >> > >> add special handling for that in the kernel? We'll definitely need to
> >> > >> handle that somehow...
> >> > >
> >> > > I am not familiar with all the details here. Do you know a reason why
> >> > > these cases would stop working / why special handling would be needed?
> >> > > For example, if I have a UMEM that uses hugepages and XDP_PASS is
> >> > > returned, then the data is just copied into an SKB right? SKBs can
> >> > > also be created directly from hugepages AFAIK. So I don't understand
> >> > > what the issue would be. Can someone explain this concern?
> >> >
> >> > Well, I was asking :) It may well be that the SKB path just works; did
> >> > you test this? Pretty sure XDP_REDIRECT to another device won't, though?
> >
> > for XDP_PASS we have to allocate a new buffer and copy the contents from
> > current xdp_buff that was backed by xsk_buff_pool and give the current one
> > back to pool. I am not sure if __napi_alloc_skb() is always capable of
> > handling len > PAGE_SIZE - i believe there might a particular combination
> > of settings that allows it, but if not we should have a fallback path that
> > would iterate over data and copy this to a certain (linear + frags) parts.
> > This implies non-zero effort that is needed for jumbo frames ZC support.
> >
> > I can certainly test this out and play with it - maybe this just works, I
> > didn't check yet. Even if it does, then we need some kind of temporary
> > mechanism that will forbid loading ZC jumbo frames due to what Toke
> > brought up.
>
> Yeah, this was exactly the kind of thing I was worried about (same for
> XDP_REDIRECT). Thanks for fleshing it out a bit :)
>
> >> >
> >>
> >> I was also asking :-)
> >>
> >> I tested that the SKB path is usable today with this patch.
> >> Specifically, sending and receiving large jumbo packets with AF_XDP
> >> and that a non-multi-buffer XDP program could access the whole packet.
> >> I have not specifically tested XDP_REDIRECT to another device or
> >> anything with ZC since that is not possible without driver support.
> >>
> >> My feeling is, there wouldn't be non-trivial issues here since this
> >> patchset changes nothing except allowing the maximum chunk size to be
> >> larger. The driver either supports larger MTUs with XDP enabled or it
> >> doesn't. If it doesn't, the frames are dropped anyway. Also, chunk
> >> size mismatches between two XSKs (e.g. with XDP_REDIRECT) would be
> >> something supported or not supported irrespective of this patchset.
> >
> > Here is the comparison between multi-buffer and jumbo frames that I did
> > for ZC ice driver. Configured MTU was 8192 as this is the frame size for
> > aligned mode when working with huge pages. I am presenting plain numbers
> > over here from xdpsock.
> >
> > Mbuf, packet size = 8192 - XDP_PACKET_HEADROOM
> > 885,705pps - rxdrop frame_size=4096
> > 806,307pps - l2fwd frame_size=4096
> > 877,989pps - rxdrop frame_size=2048
> > 773,331pps - l2fwd frame_size=2048
> >
> > Jumbo, packet size = 8192 - XDP_PACKET_HEADROOM
> > 893,530pps - rxdrop frame_size=8192
> > 841,860pps - l2fwd frame_size=8192
> >
> > Kal might say that multi-buffer numbers are imaginary as these patches
> > were never shown to the public ;) but now that we have extensive test
> > suite I am fixing some last issues that stand out, so we are asking for
> > some more patience over here... overall i was expecting that they will be
> > much worse when compared to jumbo frames, but then again i believe this
> > implementation is not ideal and can be improved. Nevertheless, jumbo
> > frames support has its value.
>
> Thank you for doing these! Okay, so that's between 1-4% improvement (vs
> the 4k frags). I dunno, I wouldn't consider that a slam dunk; would
> depend on the additional complexity if it is worth it to do both, IMO...

If we are using 4K frags, the worst case is that we have to use 3
frags to cover a 9K packet. Interpolating between the results above
would mean somewhere in the 1 - 6% range of improvement with jumbo
frames. Something that is not covered by these tests at all is the
overhead of an abstraction layer for dealing with multiple buffers. I
believe many applications would choose to have one to hide the fact
that there are multiple buffers. So I think these improvement numbers
are on the lower side.

But agree that we should factor in the complexity of covering the
cases you have brought up and see if it is worth it.

> -Toke
>