netdev - Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a5990064-df57-f991-832d-56d1156dc3f8@redhat.com>
Date:   Mon, 16 Jan 2023 14:41:40 +0100
From:   Laurent Vivier <lvivier@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Jakub Kicinski <kuba@...nel.org>, Wei Wang <weiwan@...gle.com>,
        David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        Willem de Bruijn <willemb@...gle.com>,
        virtualization@...ts.linux-foundation.org,
        Jason Wang <jasowang@...hat.com>,
        Stefano Brivio <sbrivio@...hat.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

Hi Michael,

On 5/26/21 10:24, Michael S. Tsirkin wrote:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi.  We currently do this with callbacks
> enabled which can cause extra interrupts from the card.  Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
> 
> Fix up by disabling the callbacks before polling the tx vq.
> 
> Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> ---
>   drivers/net/virtio_net.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   		return;
>   
>   	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq, true);
> +		do {
> +			virtqueue_disable_cb(sq->vq);
> +			free_old_xmit_skbs(sq, true);
> +		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   			netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>   	bool kick = !netdev_xmit_more();
>   	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
>   
>   	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq, false);
> +	do {
> +		if (use_napi)
> +			virtqueue_disable_cb(sq->vq);
>   
> -	if (use_napi && kick)
> -		virtqueue_enable_cb_delayed(sq->vq);
> +		free_old_xmit_skbs(sq, false);
> +
> +	} while (use_napi && kick &&
> +	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   	/* timestamp packet in software */
>   	skb_tx_timestamp(skb);

This patch seems to introduce a problem with QEMU connected to passt using netdev stream 
backend.

When I run an iperf3 test I get after 1 or 2 seconds of test:

[  254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
...
[  254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, 
name: output.0, 8856000 usecs ago
[  259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, 
name: output.0, 13951000 usecs ago

In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed 
all the queue entries and re-enabled the queue notification with 
virtio_queue_set_notification() and tries to flush again the queue and as it is empty it 
does nothing more and then rely on a kernel notification to re-enable the bottom half 
function. As this notification never comes the queue is stuck and kernel add entries but 
QEMU doesn't remove them:

2812 static void virtio_net_tx_bh(void *opaque)
2813 {
...
2833     ret = virtio_net_flush_tx(q);

-> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the 
function)

...
2850     virtio_queue_set_notification(q->tx_vq, 1);

-> re-enable the queue notification

2851     ret = virtio_net_flush_tx(q);
2852     if (ret == -EINVAL) {
2853         return;
2854     } else if (ret > 0) {
2855         virtio_queue_set_notification(q->tx_vq, 0);
2856         qemu_bh_schedule(q->tx_bh);
2857         q->tx_waiting = 1;
2858     }

-> ret is 0, exit the function without re-scheduling the function.
...
2859 }

If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb 
aggressively")), it works fine.

How to reproduce it:

I start passt (https://passt.top/passt):

passt -f

and then QEMU

qemu-system-x86_64 ... --netdev 
stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device 
virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0

Host side:

sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
iperf3 -s

Guest side:

sysctl -w net.core.rmem_max=536870912
sysctl -w net.core.wmem_max=536870912

ip link set dev $DEV mtu 256
iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M

Any idea of what is the problem?

Thanks,
Laurent