linux-kernel - Re: Re: [RFC v3 Optimizing veth xsk performance 0/9]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87msz04mb4.fsf@toke.dk>
Date:   Wed, 09 Aug 2023 11:06:23 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     黄杰 <huangjie.albert@...edance.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Pavel Begunkov <asml.silence@...il.com>,
        Yunsheng Lin <linyunsheng@...wei.com>,
        Kees Cook <keescook@...omium.org>,
        Richard Gobert <richardbgobert@...il.com>,
        "open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list:XDP (eXpress Data Path)" <bpf@...r.kernel.org>
Subject: Re: Re: [RFC v3 Optimizing veth xsk performance 0/9]

黄杰 <huangjie.albert@...edance.com> writes:

> Toke Høiland-Jørgensen <toke@...hat.com> 于2023年8月8日周二 20:01写道：
>>
>> Albert Huang <huangjie.albert@...edance.com> writes:
>>
>> > AF_XDP is a kernel bypass technology that can greatly improve performance.
>> > However,for virtual devices like veth,even with the use of AF_XDP sockets,
>> > there are still many additional software paths that consume CPU resources.
>> > This patch series focuses on optimizing the performance of AF_XDP sockets
>> > for veth virtual devices. Patches 1 to 4 mainly involve preparatory work.
>> > Patch 5 introduces tx queue and tx napi for packet transmission, while
>> > patch 8 primarily implements batch sending for IPv4 UDP packets, and patch 9
>> > add support for AF_XDP tx need_wakup feature. These optimizations significantly
>> > reduce the software path and support checksum offload.
>> >
>> > I tested those feature with
>> > A typical topology is shown below:
>> > client(send):                                        server:(recv)
>> > veth<-->veth-peer                                    veth1-peer<--->veth1
>> >   1       |                                                  |   7
>> >           |2                                                6|
>> >           |                                                  |
>> >         bridge<------->eth0(mlnx5)- switch -eth1(mlnx5)<--->bridge1
>> >                   3                    4                 5
>> >              (machine1)                              (machine2)
>>
>> I definitely applaud the effort to improve the performance of af_xdp
>> over veth, this is something we have flagged as in need of improvement
>> as well.
>>
>> However, looking through your patch series, I am less sure that the
>> approach you're taking here is the right one.
>>
>> AFAIU (speaking about the TX side here), the main difference between
>> AF_XDP ZC and the regular transmit mode is that in the regular TX mode
>> the stack will allocate an skb to hold the frame and push that down the
>> stack. Whereas in ZC mode, there's a driver NDO that gets called
>> directly, bypassing the skb allocation entirely.
>>
>> In this series, you're implementing the ZC mode for veth, but the driver
>> code ends up allocating an skb anyway. Which seems to be a bit of a
>> weird midpoint between the two modes, and adds a lot of complexity to
>> the driver that (at least conceptually) is mostly just a
>> reimplementation of what the stack does in non-ZC mode (allocate an skb
>> and push it through the stack).
>>
>> So my question is, why not optimise the non-zc path in the stack instead
>> of implementing the zc logic for veth? It seems to me that it would be
>> quite feasible to apply the same optimisations (bulking, and even GRO)
>> to that path and achieve the same benefits, without having to add all
>> this complexity to the veth driver?
>>
>> -Toke
>>
> thanks!
> This idea is really good indeed. You've reminded me, and that's
> something I overlooked. I will now consider implementing the solution
> you've proposed and test the performance enhancement.

Sounds good, thanks! :)

-Toke