netdev - Re: Download throttling with kernel 6.6 (in KVM guests)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250106132051.262177da@kernel.org>
Date: Mon, 6 Jan 2025 13:20:51 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Teodor Milkov <zimage@...soft.com>
Cc: netdev@...r.kernel.org, <mst@...hat.com>, <jasowang@...hat.com>
Subject: Re: Download throttling with kernel 6.6 (in KVM guests)

On Mon, 6 Jan 2025 22:15:37 +0200 Teodor Milkov wrote:
> Hello,
> 
> Following up on my previous email, I’ve found the issue occurs 
> specifically with the |virtio-net| driver in KVM guests. Switching to 
> the |e1000| driver resolves the slowdown entirely, with no throttling in 
> subsequent downloads.
> 
> The reproducer and observations remain the same, but this detail might 
> help narrow down the issue.

Let's CC the virtio maintainers, then.

The fact that a 300ms sleep between connections makes the problem 
go away is a bit odd from the networking perspective.

You may need to find a way to automate the test and try to bisect 
it down :( This may help: https://github.com/arighi/virtme-ng

> > We've encountered a regression affecting downloads in KVM guests after 
> > upgrading to Linux kernel 6.6. The issue is not present in kernel 5.15 
> > or the stock Debian 6.6 kernel on hosts (not guests) but manifests 
> > consistently in kernels 6.6 and later, including 6.6.58 and even 6.13-rc.
> >
> > Steps to Reproduce:
> > 1. Perform multiple sequential downloads, perhaps on a link with 
> > higher BDP (USA -> EU 120ms in our case).
> > 2. Look at download speeds in scenarios with varying sleep intervals 
> > between the downloads.
> >
> > Observations:
> > - Kernel 5.15: Reaches maximum throughput (~23 MB/s) consistently.
> > - Kernel 6.6:
> >   - The first download achieves maximum throughput (~23 MB/s).
> >   - Subsequent downloads are throttled to ~16 MB/s unless a sleep 
> > interval ≥ 0.3 seconds is introduced between them.
> >
> > Reproducer Script:
> > for _ in 1 2; do  curl http://example.com/1000MB.bin --max-time 8 -o 
> > /dev/null -w '(%{speed_download} B/s)\n'; sleep 0.1   ;done
> >
> >
> > Tried various sysctl settings, changing qdiscs, tcp congestion algo 
> > (e.g. from bbr to cubic), but the problem persists.
> >
> > git bisect traced the regression to commit dfa2f0483360 ("tcp: get rid 
> > of sysctl_tcp_adv_win_scale"). While a similar issue described by 
> > Netflix in 
> > https://netflixtechblog.com/investigation-of-a-cross-regional-network-performance-issue-422d6218fdf1 
> > and was supposedly fixed in kernels 6.6.33 and 6.10, the problem 
> > remains in 6.6.58 and even 6.13-rc for our case.
> >
> > Could this behavior be a side effect of `tcp_adv_win_scale` removal, 
> > or is it indicative of something else?
> >
> > We would appreciate any insights or guidance how to further 
> > investigate this regression.
> >
> > Best regards!
> >