linux-kernel - [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <71afbe18-3a5a-44ca-bb3b-b018f73ae8c6@tu-dortmund.de>
Date: Wed, 24 Sep 2025 07:41:28 +0200
From: Simon Schippers <simon.schippers@...dortmund.de>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: willemdebruijn.kernel@...il.com, jasowang@...hat.com, eperezma@...hat.com,
        stephen@...workplumber.org, leiyang@...hat.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, virtualization@...ts.linux.dev,
        kvm@...r.kernel.org, Tim Gebauer <tim.gebauer@...dortmund.de>
Subject: [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue
 before reaching a full ptr_ring

Hi,
first of all thank you very much for your detailed replies! :)

On 23.09.25 16:47, Michael S. Tsirkin wrote:
> On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote:
>> Stop the netdev queue ahead of __ptr_ring_produce when
>> __ptr_ring_full_next signals the ring is about to fill. Due to the
>> smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to
>> notice the stopped netdev queue after seeing the new ptr_ring entry. As
>> both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock,
>> the lock is held during the execution of both methods.
>>
>> dev->lltx is disabled to ensure that tun_net_xmit is not called even
>> though the netdev queue is stopped (which happened in my testing,
>> resulting in rare packet drops). Consequently, the update of trans_start
>> in tun_net_xmit is also removed.
>>
>> Co-developed-by: Tim Gebauer <tim.gebauer@...dortmund.de>
>> Signed-off-by: Tim Gebauer <tim.gebauer@...dortmund.de>
>> Signed-off-by: Simon Schippers <simon.schippers@...dortmund.de>
>> ---
>>  drivers/net/tun.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 86a9e927d0ff..c6b22af9bae8 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev)
>>  	dev->vlan_features = dev->features &
>>  			     ~(NETIF_F_HW_VLAN_CTAG_TX |
>>  			       NETIF_F_HW_VLAN_STAG_TX);
>> -	dev->lltx = true;
>> +	dev->lltx = false;
>>  
>>  	tun->flags = (tun->flags & ~TUN_FEATURES) |
>>  		      (ifr->ifr_flags & TUN_FEATURES);
>> @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>  
>>  	nf_reset_ct(skb);
>>  
>> -	if (ptr_ring_produce(&tfile->tx_ring, skb)) {
>> +	queue = netdev_get_tx_queue(dev, txq);
>> +
>> +	spin_lock(&tfile->tx_ring.producer_lock);
>> +	if (__ptr_ring_full_next(&tfile->tx_ring))
>> +		netif_tx_stop_queue(queue);
>> +
>> +	if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) {
>> +		spin_unlock(&tfile->tx_ring.producer_lock);
>>  		drop_reason = SKB_DROP_REASON_FULL_RING;
>>  		goto drop;
>>  	}
> 
> The comment makes it sound like you always keep one slot free
> in the queue but that is not the case - you just
> check before calling __ptr_ring_produce.
> 

I agree.

> 
> But it is racy isn't it? So first of all I suspect you
> are missing an mb before netif_tx_stop_queue.
> 

I don’t really get this point right now.

> Second it's racy because more entries can get freed
> afterwards. Which maybe is ok in this instance?
> But it really should be explained in more detail, if so.
> 

Will be covered in the next mail.

> 
> 
> Now - why not just check ring full *after* __ptr_ring_produce?
> Why do we need all these new APIs, and we can
> use existing ones which at least are not so hard to understand.
> 
> 

You convinced me about changing my implementation anyway but here my (old) 
idea:
I did this in V1-V4. The problem is that vhost_net is only called on 
EPOLLIN triggered by tun_net_xmit. Then, after consuming a batch from the 
ptr_ring, it must be able to see if the netdev queue stopped or not. If 
this is not the case the ptr_ring might get empty and vhost_net is not 
able to wake the queue again (because it is not stopped from its POV), 
which happened in my testing in my V4.

This is the reason why, now in the V5, in tun_net_xmit I stop the netdev 
queue before producing. With that I exploit the smp_wmb() in 
__ptr_ring_produce which is paired with the READ_ONCE in __ptr_ring_peek 
to ensure that the consumer in vhost_net sees that the netdev queue 
stopped after consuming a batch.

> 
> 
>> -
>> -	/* dev->lltx requires to do our own update of trans_start */
>> -	queue = netdev_get_tx_queue(dev, txq);
>> -	txq_trans_cond_update(queue);
>> +	spin_unlock(&tfile->tx_ring.producer_lock);
>>  
>>  	/* Notify and wake up reader process */
>>  	if (tfile->flags & TUN_FASYNC)
>> -- 
>> 2.43.0
>