[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250811220430.14063-1-simon.schippers@tu-dortmund.de>
Date: Tue, 12 Aug 2025 00:03:48 +0200
From: Simon Schippers <simon.schippers@...dortmund.de>
To: willemdebruijn.kernel@...il.com, jasowang@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Simon Schippers <simon.schippers@...dortmund.de>,
Tim Gebauer <tim.gebauer@...dortmund.de>
Subject: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
This patch is the result of our paper with the title "The NODROP Patch:
Hardening Secure Networking for Real-time Teleoperation by Preventing
Packet Drops in the Linux TUN Driver" [1].
It deals with the tun_net_xmit function which drops SKB's with the reason
SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
resulting in reduced TCP performance and packet loss for bursty video
streams when used over VPN's.
The abstract reads as follows:
"Throughput-critical teleoperation requires robust and low-latency
communication to ensure safety and performance. Often, these kinds of
applications are implemented in Linux-based operating systems and transmit
over virtual private networks, which ensure encryption and ease of use by
providing a dedicated tunneling interface (TUN) to user space
applications. In this work, we identified a specific behavior in the Linux
TUN driver, which results in significant performance degradation due to
the sender stack silently dropping packets. This design issue drastically
impacts real-time video streaming, inducing up to 29 % packet loss with
noticeable video artifacts when the internal queue of the TUN driver is
reduced to 25 packets to minimize latency. Furthermore, a small queue
length also drastically reduces the throughput of TCP traffic due to many
retransmissions. Instead, with our open-source NODROP Patch, we propose
generating backpressure in case of burst traffic or network congestion.
The patch effectively addresses the packet-dropping behavior, hardening
real-time video streaming and improving TCP throughput by 36 % in high
latency scenarios."
In addition to the mentioned performance and latency improvements for VPN
applications, this patch also allows the proper usage of qdisc's. For
example a fq_codel can not control the queuing delay when packets are
already dropped in the TUN driver. This issue is also described in [2].
The performance evaluation of the paper (see Fig. 4) showed a 4%
performance hit for a single queue TUN with the default TUN queue size of
500 packets. However it is important to notice that with the proposed
patch no packet drop ever occurred even with a TUN queue size of 1 packet.
The utilized validation pipeline is available under [3].
As the reduction of the TUN queue to a size of down to 5 packets showed no
further performance hit in the paper, a reduction of the default TUN queue
size might be desirable accompanying this patch. A reduction would
obviously reduce buffer bloat and memory requirements.
Implementation details:
- The netdev queue start/stop flow control is utilized.
- Compatible with multi-queue by only stopping/waking the specific
netdevice subqueue.
- No additional locking is used.
In the tun_net_xmit function:
- Stopping the subqueue is done when the tx_ring gets full after inserting
the SKB into the tx_ring.
- In the unlikely case when the insertion with ptr_ring_produce fails, the
old dropping behavior is used for this SKB.
In the tun_ring_recv function:
- Waking the subqueue is done after consuming a SKB from the tx_ring when
the tx_ring is empty. Waking the subqueue when the tx_ring has any
available space, so when it is not full, showed crashes in our testing. We
are open to suggestions.
- When the tx_ring is configured to be small (for example to hold 1 SKB),
queuing might be stopped in the tun_net_xmit function while at the same
time, ptr_ring_consume is not able to grab a SKB. This prevents
tun_net_xmit from being called again and causes tun_ring_recv to wait
indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
queue is woken in the wait queue if it has stopped.
- Because the tun_struct is required to get the tx_queue into the new txq
pointer, the tun_struct is passed in tun_do_read aswell. This is likely
faster then trying to get it via the tun_file tfile because it utilizes a
rcu lock.
We are open to suggestions regarding the implementation :)
Thank you for your work!
[1] Link:
https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
[2] Link:
https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
[3] Link: https://github.com/tudo-cni/nodrop
Co-developed-by: Tim Gebauer <tim.gebauer@...dortmund.de>
Signed-off-by: Tim Gebauer <tim.gebauer@...dortmund.de>
Signed-off-by: Simon Schippers <simon.schippers@...dortmund.de>
---
V1 -> V2: Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed
unnecessary netif_tx_wake_queue in tun_ring_recv.
drivers/net/tun.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index cc6c50180663..81abdd3f9aca 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1060,13 +1060,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
nf_reset_ct(skb);
- if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+ queue = netdev_get_tx_queue(dev, txq);
+ if (unlikely(ptr_ring_produce(&tfile->tx_ring, skb))) {
+ netif_tx_stop_queue(queue);
drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
}
+ if (ptr_ring_full(&tfile->tx_ring))
+ netif_tx_stop_queue(queue);
/* dev->lltx requires to do our own update of trans_start */
- queue = netdev_get_tx_queue(dev, txq);
txq_trans_cond_update(queue);
/* Notify and wake up reader process */
@@ -2110,9 +2113,10 @@ static ssize_t tun_put_user(struct tun_struct *tun,
return total;
}
-static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
+static void *tun_ring_recv(struct tun_struct *tun, struct tun_file *tfile, int noblock, int *err)
{
DECLARE_WAITQUEUE(wait, current);
+ struct netdev_queue *txq;
void *ptr = NULL;
int error = 0;
@@ -2124,6 +2128,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
goto out;
}
+ txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
add_wait_queue(&tfile->socket.wq.wait, &wait);
while (1) {
@@ -2131,6 +2136,10 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
ptr = ptr_ring_consume(&tfile->tx_ring);
if (ptr)
break;
+
+ if (unlikely(netif_tx_queue_stopped(txq)))
+ netif_tx_wake_queue(txq);
+
if (signal_pending(current)) {
error = -ERESTARTSYS;
break;
@@ -2147,6 +2156,10 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
remove_wait_queue(&tfile->socket.wq.wait, &wait);
out:
+ if (ptr_ring_empty(&tfile->tx_ring)) {
+ txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
+ netif_tx_wake_queue(txq);
+ }
*err = error;
return ptr;
}
@@ -2165,7 +2178,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
if (!ptr) {
/* Read frames from ring */
- ptr = tun_ring_recv(tfile, noblock, &err);
+ ptr = tun_ring_recv(tun, tfile, noblock, &err);
if (!ptr)
return err;
}
--
2.43.0
Powered by blists - more mailing lists