[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <458d088f-dace-4869-b4af-b381d6ca5af1@davidwei.uk>
Date: Tue, 4 Nov 2025 16:43:54 -0800
From: David Wei <dw@...idwei.uk>
To: Stanislav Fomichev <stfomichev@...il.com>,
Daniel Borkmann <daniel@...earbox.net>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, kuba@...nel.org,
davem@...emloft.net, razor@...ckwall.org, pabeni@...hat.com,
willemb@...gle.com, sdf@...ichev.me, john.fastabend@...il.com,
martin.lau@...nel.org, jordan@...fe.io, maciej.fijalkowski@...el.com,
magnus.karlsson@...el.com, toke@...hat.com, yangzhenze@...edance.com,
wangdongdong.6@...edance.com
Subject: Re: [PATCH net-next v4 00/14] netkit: Support for io_uring zero-copy
and AF_XDP
On 2025-11-04 15:22, Stanislav Fomichev wrote:
> On 10/31, Daniel Borkmann wrote:
>> Containers use virtual netdevs to route traffic from a physical netdev
>> in the host namespace. They do not have access to the physical netdev
>> in the host and thus can't use memory providers or AF_XDP that require
>> reconfiguring/restarting queues in the physical netdev.
>>
>> This patchset adds the concept of queue peering to virtual netdevs that
>> allow containers to use memory providers and AF_XDP at native speed.
>> These mapped queues are bound to a real queue in a physical netdev and
>> act as a proxy.
>>
>> Memory providers and AF_XDP operations takes an ifindex and queue id,
>> so containers would pass in an ifindex for a virtual netdev and a queue
>> id of a mapped queue, which then gets proxied to the underlying real
>> queue. Peered queues are created and bound to a real queue atomically
>> through a generic ynl netdev operation.
>>
>> We have implemented support for this concept in netkit and tested the
>> latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504
>> (bnxt_en) 100G NICs. For more details see the individual patches.
>>
>> v3->v4:
>> - ndo_queue_create store dst queue via arg (Nikolay)
>> - Small nits like a spelling issue + rev xmas (Nikolay)
>> - admin-perm flag in bind-queue spec (Jakub)
>> - Fix potential ABBA deadlock situation in bind (Jakub, Paolo, Stan)
>> - Add a peer dev_tracker to not reuse the sysfs one (Jakub)
>> - New patch (12/14) to handle the underlying device going away (Jakub)
>> - Improve commit message on queue-get (Jakub)
>> - Do not expose phys dev info from container on queue-get (Jakub)
>> - Add netif_put_rx_queue_peer_locked to simplify code (Stan)
>> - Rework xsk handling to simplify the code and drop a few patches
>> - Rebase and retested everything with mlx5 + bnxt_en
>
> I mostly looked at patches 1-8 and they look good to me. Will it be
> possible to put your sample runs from 13 and 14 into a selftest form? Even
> if you require real hw, that should be doable, similar to
> tools/testing/selftests/drivers/net/hw/devmem.py, right?
Thanks for taking a look. For io_uring at least, it requires both a
routable VIP that can be assigned to the netkit in a netns and a BPF
program for skb forwarding. I could add a selftest, but it'll be hard to
generalise across all envs. I'm hoping to get self contained QEMU VM
selftest support first. WDYT?
Powered by blists - more mailing lists