linux-kernel - Re: [REGRESSION] Massive virtio-net throughput drop in guest VM with Linux 6.8+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2b2d686e-9164-4ad6-aa83-2d97aba680b6@ovn.org>
Date: Sat, 5 Apr 2025 14:18:57 +0200
From: Ilya Maximets <i.maximets@....org>
To: Markus Fohrer <markus.fohrer@...ked.de>,
 Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 "Michael S. Tsirkin" <mst@...hat.com>
Cc: i.maximets@....org, virtualization@...ts.linux-foundation.org,
 jasowang@...hat.com, davem@...emloft.net, edumazet@...gle.com,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [REGRESSION] Massive virtio-net throughput drop in guest VM with
 Linux 6.8+

On 4/5/25 8:15 AM, Markus Fohrer wrote:
> Am Samstag, dem 05.04.2025 um 00:05 +0200 schrieb Ilya Maximets:
> 
>> On 4/4/25 5:13 PM, Willem de Bruijn wrote:
>>
>>> Markus Fohrer wrote:
>>>
>>>> Am Freitag, dem 04.04.2025 um 10:52 +0200 schrieb Markus Fohrer:
>>>>
>>>>> Am Freitag, dem 04.04.2025 um 04:29 -0400 schrieb Michael S. Tsirkin:
>>>>>
>>>>>> On Fri, Apr 04, 2025 at 10:16:55AM +0200, Markus Fohrer wrote:
>>>>>>
>>>>>>> Am Donnerstag, dem 03.04.2025 um 09:04 -0400 schrieb Michael S.
>>>>>>> Tsirkin:
>>>>>>>
>>>>>>>> On Wed, Apr 02, 2025 at 11:12:07PM +0200, Markus Fohrer wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm observing a significant performance regression in KVM
>>>>>>>>> guest
>>>>>>>>> VMs
>>>>>>>>> using virtio-net with recent Linux kernels (6.8.1+ and 6.14).
>>>>>>>>>
>>>>>>>>> When running on a host system equipped with a Broadcom
>>>>>>>>> NetXtreme-E
>>>>>>>>> (bnxt_en) NIC and AMD EPYC CPUs, the network throughput in
>>>>>>>>> the
>>>>>>>>> guest drops to 100–200 KB/s. The same guest configuration
>>>>>>>>> performs
>>>>>>>>> normally (~100 MB/s) when using kernel 6.8.0 or when the VM
>>>>>>>>> is
>>>>>>>>> moved to a host with Intel NICs.
>>>>>>>>>
>>>>>>>>> Test environment:
>>>>>>>>> - Host: QEMU/KVM, Linux 6.8.1 and 6.14.0
>>>>>>>>> - Guest: Linux with virtio-net interface
>>>>>>>>> - NIC: Broadcom BCM57416 (bnxt_en driver, no issues at host
>>>>>>>>> level)
>>>>>>>>> - CPU: AMD EPYC
>>>>>>>>> - Storage: virtio-scsi
>>>>>>>>> - VM network: virtio-net, virtio-scsi (no CPU or IO
>>>>>>>>> bottlenecks)
>>>>>>>>> - Traffic test: iperf3, scp, wget consistently slow in guest
>>>>>>>>>
>>>>>>>>> This issue is not present:
>>>>>>>>> - On 6.8.0 
>>>>>>>>> - On hosts with Intel NICs (same VM config)
>>>>>>>>>
>>>>>>>>> I have bisected the issue to the following upstream commit:
>>>>>>>>>
>>>>>>>>>   49d14b54a527 ("virtio-net: Suppress tx timeout warning for
>>>>>>>>> small
>>>>>>>>> tx")
>>>>>>>>>   [https://git.kernel.org/linus/49d14b54a527](https://git.kernel.org/linus/49d14b54a527)
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks a lot for the info!
>>>>>>>>
>>>>>>>>
>>>>>>>> both the link and commit point at:
>>>>>>>>
>>>>>>>> commit 49d14b54a527289d09a9480f214b8c586322310a
>>>>>>>> Author: Eric Dumazet <[edumazet@...gle.com](mailto:edumazet@...gle.com)>
>>>>>>>> Date:   Thu Sep 26 16:58:36 2024 +0000
>>>>>>>>
>>>>>>>>     net: test for not too small csum_start in
>>>>>>>> virtio_net_hdr_to_skb()
>>>>>>>>     
>>>>>>>>
>>>>>>>> is this what you mean?
>>>>>>>>
>>>>>>>> I don't know which commit is "virtio-net: Suppress tx timeout
>>>>>>>> warning
>>>>>>>> for small tx"
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Reverting this commit restores normal network performance in
>>>>>>>>> affected guest VMs.
>>>>>>>>>
>>>>>>>>> I’m happy to provide more data or assist with testing a
>>>>>>>>> potential
>>>>>>>>> fix.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Markus Fohrer
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks! First I think it's worth checking what is the setup,
>>>>>>>> e.g.
>>>>>>>> which offloads are enabled.
>>>>>>>> Besides that, I'd start by seeing what's doing on. Assuming I'm
>>>>>>>> right
>>>>>>>> about
>>>>>>>> Eric's patch:
>>>>>>>>
>>>>>>>> diff --git a/include/linux/virtio_net.h
>>>>>>>> b/include/linux/virtio_net.h
>>>>>>>> index 276ca543ef44d8..02a9f4dc594d02 100644
>>>>>>>> --- a/include/linux/virtio_net.h
>>>>>>>> +++ b/include/linux/virtio_net.h
>>>>>>>> @@ -103,8 +103,10 @@ static inline int
>>>>>>>> virtio_net_hdr_to_skb(struct
>>>>>>>> sk_buff *skb,
>>>>>>>>  
>>>>>>>>  		if (!skb_partial_csum_set(skb, start, off))
>>>>>>>>  			return -EINVAL;
>>>>>>>> +		if (skb_transport_offset(skb) < nh_min_len)
>>>>>>>> +			return -EINVAL;
>>>>>>>>  
>>>>>>>> -		nh_min_len = max_t(u32, nh_min_len,
>>>>>>>> skb_transport_offset(skb));
>>>>>>>> +		nh_min_len = skb_transport_offset(skb);
>>>>>>>>  		p_off = nh_min_len + thlen;
>>>>>>>>  		if (!pskb_may_pull(skb, p_off))
>>>>>>>>  			return -EINVAL;
>>>>>>>>
>>>>>>>>
>>>>>>>> sticking a printk before return -EINVAL to show the offset and
>>>>>>>> nh_min_len
>>>>>>>> would be a good 1st step. Thanks!
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I added the following printk inside virtio_net_hdr_to_skb():
>>>>>>>
>>>>>>>     if (skb_transport_offset(skb) < nh_min_len){
>>>>>>>         printk(KERN_INFO "virtio_net: 3 drop,
>>>>>>> transport_offset=%u,
>>>>>>> nh_min_len=%u\n",
>>>>>>>                skb_transport_offset(skb), nh_min_len);
>>>>>>>         return -EINVAL;
>>>>>>>     }
>>>>>>>
>>>>>>> Built and installed the kernel, then triggered a large download
>>>>>>> via:
>>>>>>>
>>>>>>>     wget [http://speedtest.belwue.net/10G](http://speedtest.belwue.net/10G)
>>>>>>>
>>>>>>> Relevant output from `dmesg -w`:
>>>>>>>
>>>>>>> [   57.327943] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.428942] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.428962] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.553068] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.553088] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.576678] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.618438] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.618453] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.703077] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.823072] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.891982] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   57.946190] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>> [   58.218686] virtio_net: 3 drop, transport_offset=34,
>>>>>>> nh_min_len=40  
>>>>>>
>>>>>>
>>>>>> Hmm indeed. And what about these values?
>>>>>>                 u32 start = __virtio16_to_cpu(little_endian, hdr-
>>>>>>
>>>>>>> csum_start);
>>>>>>
>>>>>>                 u32 off = __virtio16_to_cpu(little_endian, hdr-
>>>>>>
>>>>>>> csum_offset);
>>>>>>
>>>>>>                 u32 needed = start + max_t(u32, thlen, off +
>>>>>> sizeof(__sum16));
>>>>>> print them too?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I would now do the test with commit
>>>>>>> 49d14b54a527289d09a9480f214b8c586322310a and commit
>>>>>>> 49d14b54a527289d09a9480f214b8c586322310a~1
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Worth checking though it seems likely now the hypervisor is doing
>>>>>> weird
>>>>>> things. what kind of backend is it? qemu? tun? vhost-user? vhost-
>>>>>> net?
>>>>>>
>>>>>
>>>>>
>>>>> Backend: QEMU/KVM hypervisor (Proxmox)
>>>>>
>>>>>
>>>>> printk output:
>>>>>
>>>>> [   58.641906] virtio_net: drop, transport_offset=34  start=34,
>>>>> off=16,
>>>>> needed=54, nh_min_len=40
>>>>> [   58.678048] virtio_net: drop, transport_offset=34  start=34,
>>>>> off=16,
>>>>> needed=54, nh_min_len=40
>>>>> [   58.952871] virtio_net: drop, transport_offset=34  start=34,
>>>>> off=16,
>>>>> needed=54, nh_min_len=40
>>>>> [   58.962157] virtio_net: drop, transport_offset=34  start=34,
>>>>> off=16,
>>>>> needed=54, nh_min_len=40
>>>>> [   59.071645] virtio_net: drop, transport_offset=34  start=34,
>>>>> off=16,
>>>>> needed=54, nh_min_len=40
>>>>
>>>
>>>
>>> So likely a TCP/IPv4 packet, but with VIRTIO_NET_HDR_GSO_TCPV6.
>>
>>
>>
>> Hi, Markus.
>>
>> Given this and the fact that the issue depends on the bnxt_en NIC on the
>> hist, I'd make an educated guess that the problem is the host NIC driver.
>>
>> There are some known GRO issues in the nbxt_en driver fixed recently in
>>
>>   commit de37faf41ac55619dd329229a9bd9698faeabc52
>>   Author: Michael Chan <[michael.chan@...adcom.com](mailto:michael.chan@...adcom.com)>
>>   Date:   Wed Dec 4 13:59:17 2024 -0800
>>
>>     bnxt_en: Fix GSO type for HW GRO packets on 5750X chips
>>
>> It's not clear to me what's your host kernel version.  But the commit
>> above was introduced in 6.14 and may be in fairly recent stable kernels.
>> The oldest is v6.12.6 AFAICT.  Can you try one of these host kernels?
>>
>> Also, to confirm and workaround the problem, please, try disabling HW GRO
>> on the bnxt_en NIC first:
>>
>>   ethtool -K <BNXT_EN NIC IFACE> rx-gro-hw off
>>
>> If that doesn't help, then the problem is likely something different.
>>
>> Best regards, Ilya Maximets.
> 
> 
> Setting `rx-gro-hw off` on the Broadcom interfaces also resolves the issue:
> 
> ethtool -K ens1f0np0 rx-gro-hw off  
> ethtool -K ens1f1np1 rx-gro-hw off  
> ethtool -K ens1f2np2 rx-gro-hw off  
> ethtool -K ens1f3np3 rx-gro-hw off
> 
> With this setting applied, the guest receives traffic correctly even when GRO is enabled on the host.

OK.  It's definitely a host bnxt_en driver bug then.

> 
> The system is running the latest Proxmox kernel:
> 
> 6.8.12-9-pve

6.8 is long EoL upstream, so you need to ask distribution maintainers to
backport aforementioned bnxt_en driver fix (de37faf41ac5), or move to
latest 6.12+ stable kernels which are supported upstream.

Since proxmox mostly just rebuilds ubuntu kernels, you probably need to
ask for fixes to be backported in the corresponding ubuntu kernel first.

Meanwhile, you may run with rx-gro-hw off on those cards.

Best regards, Ilya Maximets.

> 
> 
> 
> 
>>> This is observed in the guest on the ingress path, right? In
>>> virtnet_receive_done.
>>>
>>> Is this using vhost-net in the host for pass-through? IOW, is
>>> the host writing the virtio_net_hdr too?
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> I just noticed that commit 17bd3bd82f9f79f3feba15476c2b2c95a9b11ff8
>>>> (tcp_offload.c: gso fix) also touches checksum handling and may
>>>> affect how skb state is passed to virtio_net_hdr_to_skb().
>>>>
>>>> Is it possible that the regression only appears due to the combination
>>>> of 17bd3bd8 and 49d14b54a5?
>>>
>>>
>>> That patch only affects packets with SKB_GSO_FRAGLIST. Which is only
>>> set on forwarding if NETIF_F_FRAGLIST is set. I don 
>>
>>
>