[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b9fff8e1-fb96-4b1f-9767-9d89adf31060@tu-dortmund.de>
Date: Fri, 21 Nov 2025 10:22:54 +0100
From: Simon Schippers <simon.schippers@...dortmund.de>
To: Jason Wang <jasowang@...hat.com>
Cc: willemdebruijn.kernel@...il.com, andrew+netdev@...n.ch,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, mst@...hat.com, eperezma@...hat.com,
jon@...anix.com, tim.gebauer@...dortmund.de, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
virtualization@...ts.linux.dev
Subject: [PATCH net-next v6 0/8] tun/tap & vhost-net: netdev queue flow
control to avoid ptr_ring tail drop
On 11/21/25 07:19, Jason Wang wrote:
> On Thu, Nov 20, 2025 at 11:30 PM Simon Schippers
> <simon.schippers@...dortmund.de> wrote:
>>
>> This patch series deals with tun/tap and vhost-net which drop incoming
>> SKBs whenever their internal ptr_ring buffer is full. Instead, with this
>> patch series, the associated netdev queue is stopped before this happens.
>> This allows the connected qdisc to function correctly as reported by [1]
>> and improves application-layer performance, see our paper [2]. Meanwhile
>> the theoretical performance differs only slightly:
>>
>> +--------------------------------+-----------+----------+
>> | pktgen benchmarks to Debian VM | Stock | Patched |
>> | i5 6300HQ, 20M packets | | |
>> +-----------------+--------------+-----------+----------+
>> | TAP | Transmitted | 195 Kpps | 183 Kpps |
>> | +--------------+-----------+----------+
>> | | Lost | 1615 Kpps | 0 pps |
>> +-----------------+--------------+-----------+----------+
>> | TAP+vhost_net | Transmitted | 589 Kpps | 588 Kpps |
>> | +--------------+-----------+----------+
>> | | Lost | 1164 Kpps | 0 pps |
>> +-----------------+--------------+-----------+----------+
>
Hi Jason,
thank you for your reply!
> PPS drops somehow for TAP, any reason for that?
I have no explicit explanation for that except general overheads coming
with this implementation.
>
> Btw, I had some questions:
>
> 1) most of the patches in this series would introduce non-trivial
> impact on the performance, we probably need to benchmark each or split
> the series. What's more we need to run TCP benchmark
> (throughput/latency) as well as pktgen see the real impact
What could be done, IMO, is to activate tun_ring_consume() /
tap_ring_consume() before enabling tun_ring_produce(). Then we could see
if this alone drops performance.
For TCP benchmarks, you mean userspace performance like iperf3 between a
host and a guest system?
>
> 2) I see this:
>
> if (unlikely(tun_ring_produce(&tfile->tx_ring, queue, skb))) {
> drop_reason = SKB_DROP_REASON_FULL_RING;
> goto drop;
> }
>
> So there could still be packet drop? Or is this related to the XDP path?
Yes, there can be packet drops after a ptr_ring resize or a ptr_ring
unconsume. Since those two happen so rarely, I figured we should just
drop in this case.
>
> 3) The LLTX change would have performance implications, but the
> benmark doesn't cover the case where multiple transmission is done in
> parallel
Do you mean multiple applications that produce traffic and potentially
run on different CPUs?
>
> 4) After the LLTX change, it seems we've lost the synchronization with
> the XDP_TX and XDP_REDIRECT path?
I must admit I did not take a look at XDP and cannot really judge if/how
lltx has an impact on XDP. But from my point of view, __netif_tx_lock()
instead of __netif_tx_acquire(), is executed before the tun_net_xmit()
call and I do not see the impact for XDP, which calls its own methods.
>
> 5) The series introduces various ptr_ring helpers with lots of
> ordering stuff which is complicated, I wonder if we first have a
> simple patch to implement the zero packet loss
I personally don't see how a simpler patch is possible without using
discouraged practices like returning NETDEV_TX_BUSY in tun_net_xmit or
spin locking between producer and consumer. But I am open for
suggestions :)
>
>>
>> This patch series includes tun/tap, and vhost-net because they share
>> logic. Adjusting only one of them would break the others. Therefore, the
>> patch series is structured as follows:
>> 1+2: new ptr_ring helpers for 3
>> 3: tun/tap: tun/tap: add synchronized ring produce/consume with queue
>> management
>> 4+5+6: tun/tap: ptr_ring wrappers and other helpers to be called by
>> vhost-net
>> 7: tun/tap & vhost-net: only now use the previous implemented functions to
>> not break git bisect
>> 8: tun/tap: drop get ring exports (not used anymore)
>>
>> Possible future work:
>> - Introduction of Byte Queue Limits as suggested by Stephen Hemminger
>
> This seems to be not easy. The tx completion depends on the userspace behaviour.
I agree, but I really would like to reduce the buffer bloat caused by the
default 500 TUN / 1000 TAP packet queue without losing performance.
>
>> - Adaption of the netdev queue flow control for ipvtap & macvtap
>>
>> [1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
>> [2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
>>
>
> Thanks
>
Thanks! :)
Powered by blists - more mailing lists