[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQqKsGDdeYQqA91s@mini-arch>
Date: Tue, 4 Nov 2025 15:22:24 -0800
From: Stanislav Fomichev <stfomichev@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, kuba@...nel.org,
davem@...emloft.net, razor@...ckwall.org, pabeni@...hat.com,
willemb@...gle.com, sdf@...ichev.me, john.fastabend@...il.com,
martin.lau@...nel.org, jordan@...fe.io,
maciej.fijalkowski@...el.com, magnus.karlsson@...el.com,
dw@...idwei.uk, toke@...hat.com, yangzhenze@...edance.com,
wangdongdong.6@...edance.com
Subject: Re: [PATCH net-next v4 00/14] netkit: Support for io_uring zero-copy
and AF_XDP
On 10/31, Daniel Borkmann wrote:
> Containers use virtual netdevs to route traffic from a physical netdev
> in the host namespace. They do not have access to the physical netdev
> in the host and thus can't use memory providers or AF_XDP that require
> reconfiguring/restarting queues in the physical netdev.
>
> This patchset adds the concept of queue peering to virtual netdevs that
> allow containers to use memory providers and AF_XDP at native speed.
> These mapped queues are bound to a real queue in a physical netdev and
> act as a proxy.
>
> Memory providers and AF_XDP operations takes an ifindex and queue id,
> so containers would pass in an ifindex for a virtual netdev and a queue
> id of a mapped queue, which then gets proxied to the underlying real
> queue. Peered queues are created and bound to a real queue atomically
> through a generic ynl netdev operation.
>
> We have implemented support for this concept in netkit and tested the
> latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504
> (bnxt_en) 100G NICs. For more details see the individual patches.
>
> v3->v4:
> - ndo_queue_create store dst queue via arg (Nikolay)
> - Small nits like a spelling issue + rev xmas (Nikolay)
> - admin-perm flag in bind-queue spec (Jakub)
> - Fix potential ABBA deadlock situation in bind (Jakub, Paolo, Stan)
> - Add a peer dev_tracker to not reuse the sysfs one (Jakub)
> - New patch (12/14) to handle the underlying device going away (Jakub)
> - Improve commit message on queue-get (Jakub)
> - Do not expose phys dev info from container on queue-get (Jakub)
> - Add netif_put_rx_queue_peer_locked to simplify code (Stan)
> - Rework xsk handling to simplify the code and drop a few patches
> - Rebase and retested everything with mlx5 + bnxt_en
I mostly looked at patches 1-8 and they look good to me. Will it be
possible to put your sample runs from 13 and 14 into a selftest form? Even
if you require real hw, that should be doable, similar to
tools/testing/selftests/drivers/net/hw/devmem.py, right?
Powered by blists - more mailing lists