lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b226b398-0985-4143-b0ea-14f785fe4d1b@davidwei.uk>
Date: Sat, 8 Nov 2025 14:18:31 -0800
From: David Wei <dw@...idwei.uk>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
 bpf@...r.kernel.org, kuba@...nel.org, davem@...emloft.net,
 razor@...ckwall.org, pabeni@...hat.com, willemb@...gle.com, sdf@...ichev.me,
 john.fastabend@...il.com, martin.lau@...nel.org, jordan@...fe.io,
 maciej.fijalkowski@...el.com, magnus.karlsson@...el.com, toke@...hat.com,
 yangzhenze@...edance.com, wangdongdong.6@...edance.com
Subject: Re: [PATCH net-next v4 00/14] netkit: Support for io_uring zero-copy
 and AF_XDP

On 2025-11-05 11:51, Stanislav Fomichev wrote:
> On 11/04, David Wei wrote:
>> On 2025-11-04 15:22, Stanislav Fomichev wrote:
>>> On 10/31, Daniel Borkmann wrote:
>>>> Containers use virtual netdevs to route traffic from a physical netdev
>>>> in the host namespace. They do not have access to the physical netdev
>>>> in the host and thus can't use memory providers or AF_XDP that require
>>>> reconfiguring/restarting queues in the physical netdev.
>>>>
>>>> This patchset adds the concept of queue peering to virtual netdevs that
>>>> allow containers to use memory providers and AF_XDP at native speed.
>>>> These mapped queues are bound to a real queue in a physical netdev and
>>>> act as a proxy.
>>>>
>>>> Memory providers and AF_XDP operations takes an ifindex and queue id,
>>>> so containers would pass in an ifindex for a virtual netdev and a queue
>>>> id of a mapped queue, which then gets proxied to the underlying real
>>>> queue. Peered queues are created and bound to a real queue atomically
>>>> through a generic ynl netdev operation.
>>>>
>>>> We have implemented support for this concept in netkit and tested the
>>>> latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504
>>>> (bnxt_en) 100G NICs. For more details see the individual patches.
>>>>
>>>> v3->v4:
>>>>    - ndo_queue_create store dst queue via arg (Nikolay)
>>>>    - Small nits like a spelling issue + rev xmas (Nikolay)
>>>>    - admin-perm flag in bind-queue spec (Jakub)
>>>>    - Fix potential ABBA deadlock situation in bind (Jakub, Paolo, Stan)
>>>>    - Add a peer dev_tracker to not reuse the sysfs one (Jakub)
>>>>    - New patch (12/14) to handle the underlying device going away (Jakub)
>>>>    - Improve commit message on queue-get (Jakub)
>>>>    - Do not expose phys dev info from container on queue-get (Jakub)
>>>>    - Add netif_put_rx_queue_peer_locked to simplify code (Stan)
>>>>    - Rework xsk handling to simplify the code and drop a few patches
>>>>    - Rebase and retested everything with mlx5 + bnxt_en
>>>
>>> I mostly looked at patches 1-8 and they look good to me. Will it be
>>> possible to put your sample runs from 13 and 14 into a selftest form? Even
>>> if you require real hw, that should be doable, similar to
>>> tools/testing/selftests/drivers/net/hw/devmem.py, right?
>>
>> Thanks for taking a look. For io_uring at least, it requires both a
>> routable VIP that can be assigned to the netkit in a netns and a BPF
>> program for skb forwarding. I could add a selftest, but it'll be hard to
>> generalise across all envs. I'm hoping to get self contained QEMU VM
>> selftest support first. WDYT?
> 
> You can start at least with having what you have in patch 3 as a
> selftest. NIPA runs with fbnic qemu model, you should be able to at
> least test the netns setup, make sure peer-info works as expected, etc.
> You can verify that things like changing the number of channels are
> blocked when you have the queued bound to netkit..
> 
> But also, regarding the datapath test, not sure you need another qemu. Not
> even sure why you need a vip? You can carve a single port and share
> the same host ip in the netns? Alternatively I think you can carve
> out 192.168.x.y from /32 and assign it to the machine. We have datapath
> devmem tests working without any special qemu vms (besides, well,
> special fbnic qemu, but you should be able to test on it as well).

There's a check in netdev core that prevents forwarding net_iovs. The
only way to forward packets to netkit in a netns is using bpf. If
there's no routable VIP, then the bpf prog also has to do bidirectional
NAT.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ