lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACGkMEuFXojXZ-tyaY284CXZmx+0nG4-bKB3dzsQvwuxmM9TwQ@mail.gmail.com>
Date: Mon, 11 Aug 2025 10:44:45 +0800
From: Jason Wang <jasowang@...hat.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Simon Schippers <simon.schippers@...dortmund.de>, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Tim Gebauer <tim.gebauer@...dortmund.de>
Subject: Re: [PATCH net] TUN/TAP: Improving throughput and latency by avoiding
 SKB drops

On Sat, Aug 9, 2025 at 10:15 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Simon Schippers wrote:
> > This patch is the result of our paper with the title "The NODROP Patch:
> > Hardening Secure Networking for Real-time Teleoperation by Preventing
> > Packet Drops in the Linux TUN Driver" [1].
> > It deals with the tun_net_xmit function which drops SKB's with the reason
> > SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
> > resulting in reduced TCP performance and packet loss for bursty video
> > streams when used over VPN's.
> >
> > The abstract reads as follows:
> > "Throughput-critical teleoperation requires robust and low-latency
> > communication to ensure safety and performance. Often, these kinds of
> > applications are implemented in Linux-based operating systems and transmit
> > over virtual private networks, which ensure encryption and ease of use by
> > providing a dedicated tunneling interface (TUN) to user space
> > applications. In this work, we identified a specific behavior in the Linux
> > TUN driver, which results in significant performance degradation due to
> > the sender stack silently dropping packets. This design issue drastically
> > impacts real-time video streaming, inducing up to 29 % packet loss with
> > noticeable video artifacts when the internal queue of the TUN driver is
> > reduced to 25 packets to minimize latency. Furthermore, a small queue
>
> This clearly increases dropcount. Does it meaningfully reduce latency?
>
> The cause of latency here is scheduling of the process reading from
> the tun FD.
>
> Task pinning and/or adjusting scheduler priority/algorithm/etc. may
> be a more effective and robust approach to reducing latency.
>
> > length also drastically reduces the throughput of TCP traffic due to many
> > retransmissions. Instead, with our open-source NODROP Patch, we propose
> > generating backpressure in case of burst traffic or network congestion.
> > The patch effectively addresses the packet-dropping behavior, hardening
> > real-time video streaming and improving TCP throughput by 36 % in high
> > latency scenarios."
> >
> > In addition to the mentioned performance and latency improvements for VPN
> > applications, this patch also allows the proper usage of qdisc's. For
> > example a fq_codel can not control the queuing delay when packets are
> > already dropped in the TUN driver. This issue is also described in [2].
> >
> > The performance evaluation of the paper (see Fig. 4) showed a 4%
> > performance hit for a single queue TUN with the default TUN queue size of
> > 500 packets. However it is important to notice that with the proposed
> > patch no packet drop ever occurred even with a TUN queue size of 1 packet.
> > The utilized validation pipeline is available under [3].
> >
> > As the reduction of the TUN queue to a size of down to 5 packets showed no
> > further performance hit in the paper, a reduction of the default TUN queue
> > size might be desirable accompanying this patch. A reduction would
> > obviously reduce buffer bloat and memory requirements.
> >
> > Implementation details:
> > - The netdev queue start/stop flow control is utilized.
> > - Compatible with multi-queue by only stopping/waking the specific
> > netdevice subqueue.
> > - No additional locking is used.
> >
> > In the tun_net_xmit function:
> > - Stopping the subqueue is done when the tx_ring gets full after inserting
> > the SKB into the tx_ring.
> > - In the unlikely case when the insertion with ptr_ring_produce fails, the
> > old dropping behavior is used for this SKB.
> > - In the unlikely case when tun_net_xmit is called even though the tx_ring
> > is full, the subqueue is stopped once again and NETDEV_TX_BUSY is returned.
> >
> > In the tun_ring_recv function:
> > - Waking the subqueue is done after consuming a SKB from the tx_ring when
> > the tx_ring is empty. Waking the subqueue when the tx_ring has any
> > available space, so when it is not full, showed crashes in our testing. We
> > are open to suggestions.
> > - Especially when the tx_ring is configured to be small, queuing might be
> > stopped in the tun_net_xmit function while at the same time,
> > ptr_ring_consume is not able to grab a packet. This prevents tun_net_xmit
> > from being called again and causes tun_ring_recv to wait indefinitely for
> > a packet. Therefore, the queue is woken after grabbing a packet if the
> > queuing is stopped. The same behavior is applied in the accompanying wait
> > queue.
> > - Because the tun_struct is required to get the tx_queue into the new txq
> > pointer, the tun_struct is passed in tun_do_read aswell. This is likely
> > faster then trying to get it via the tun_file tfile because it utilizes a
> > rcu lock.
> >
> > We are open to suggestions regarding the implementation :)
> > Thank you for your work!
> >
> > [1] Link:
> > https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2
> > 025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> > [2] Link:
> > https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective
> > -on-tun-device
> > [3] Link: https://github.com/tudo-cni/nodrop
> >
> > Co-developed-by: Tim Gebauer <tim.gebauer@...dortmund.de>
> > Signed-off-by: Tim Gebauer <tim.gebauer@...dortmund.de>
> > Signed-off-by: Simon Schippers <simon.schippers@...dortmund.de>
> > ---
> >  drivers/net/tun.c | 32 ++++++++++++++++++++++++++++----
> >  1 file changed, 28 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index cc6c50180663..e88a312d3c72 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1023,6 +1023,13 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >
> >       netif_info(tun, tx_queued, tun->dev, "%s %d\n", __func__, skb->len);
> >
> > +     if (unlikely(ptr_ring_full(&tfile->tx_ring))) {
> > +             queue = netdev_get_tx_queue(dev, txq);
> > +             netif_tx_stop_queue(queue);
> > +             rcu_read_unlock();
> > +             return NETDEV_TX_BUSY;
>
> returning NETDEV_TX_BUSY is discouraged.
>
> In principle pausing the "device" queue for TUN, similar to other
> devices, sounds reasonable, iff the simpler above suggestion is not
> sufficient.
>
> But then preferable to pause before the queue is full, to avoid having
> to return failure. See for instance virtio_net.

+1 and we probably need to invent new ptr ring helpers for that.

Thanks


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ