lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEuboys8sCJFUTGxHUeouPFnVqVLGQBefvmxYDe4ooLfLg@mail.gmail.com>
Date: Fri, 21 Nov 2025 14:19:48 +0800
From: Jason Wang <jasowang@...hat.com>
To: Simon Schippers <simon.schippers@...dortmund.de>
Cc: willemdebruijn.kernel@...il.com, andrew+netdev@...n.ch, 
	davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, 
	mst@...hat.com, eperezma@...hat.com, jon@...anix.com, 
	tim.gebauer@...dortmund.de, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, 
	virtualization@...ts.linux.dev
Subject: Re: [PATCH net-next v6 0/8] tun/tap & vhost-net: netdev queue flow
 control to avoid ptr_ring tail drop

On Thu, Nov 20, 2025 at 11:30 PM Simon Schippers
<simon.schippers@...dortmund.de> wrote:
>
> This patch series deals with tun/tap and vhost-net which drop incoming
> SKBs whenever their internal ptr_ring buffer is full. Instead, with this
> patch series, the associated netdev queue is stopped before this happens.
> This allows the connected qdisc to function correctly as reported by [1]
> and improves application-layer performance, see our paper [2]. Meanwhile
> the theoretical performance differs only slightly:
>
> +--------------------------------+-----------+----------+
> | pktgen benchmarks to Debian VM | Stock     | Patched  |
> | i5 6300HQ, 20M packets         |           |          |
> +-----------------+--------------+-----------+----------+
> | TAP             | Transmitted  | 195 Kpps  | 183 Kpps |
> |                 +--------------+-----------+----------+
> |                 | Lost         | 1615 Kpps | 0 pps    |
> +-----------------+--------------+-----------+----------+
> | TAP+vhost_net   | Transmitted  | 589 Kpps  | 588 Kpps |
> |                 +--------------+-----------+----------+
> |                 | Lost         | 1164 Kpps | 0 pps    |
> +-----------------+--------------+-----------+----------+

PPS drops somehow for TAP, any reason for that?

Btw, I had some questions:

1) most of the patches in this series would introduce non-trivial
impact on the performance, we probably need to benchmark each or split
the series. What's more we need to run TCP benchmark
(throughput/latency) as well as pktgen see the real impact

2) I see this:

        if (unlikely(tun_ring_produce(&tfile->tx_ring, queue, skb))) {
                drop_reason = SKB_DROP_REASON_FULL_RING;
                goto drop;
        }

So there could still be packet drop? Or is this related to the XDP path?

3) The LLTX change would have performance implications, but the
benmark doesn't cover the case where multiple transmission is done in
parallel

4) After the LLTX change, it seems we've lost the synchronization with
the XDP_TX and XDP_REDIRECT path?

5) The series introduces various ptr_ring helpers with lots of
ordering stuff which is complicated, I wonder if we first have a
simple patch to implement the zero packet loss

>
> This patch series includes tun/tap, and vhost-net because they share
> logic. Adjusting only one of them would break the others. Therefore, the
> patch series is structured as follows:
> 1+2: new ptr_ring helpers for 3
> 3: tun/tap: tun/tap: add synchronized ring produce/consume with queue
> management
> 4+5+6: tun/tap: ptr_ring wrappers and other helpers to be called by
> vhost-net
> 7: tun/tap & vhost-net: only now use the previous implemented functions to
> not break git bisect
> 8: tun/tap: drop get ring exports (not used anymore)
>
> Possible future work:
> - Introduction of Byte Queue Limits as suggested by Stephen Hemminger

This seems to be not easy. The tx completion depends on the userspace behaviour.

> - Adaption of the netdev queue flow control for ipvtap & macvtap
>
> [1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> [2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
>

Thanks


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ