[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250106132051.262177da@kernel.org>
Date: Mon, 6 Jan 2025 13:20:51 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Teodor Milkov <zimage@...soft.com>
Cc: netdev@...r.kernel.org, <mst@...hat.com>, <jasowang@...hat.com>
Subject: Re: Download throttling with kernel 6.6 (in KVM guests)
On Mon, 6 Jan 2025 22:15:37 +0200 Teodor Milkov wrote:
> Hello,
>
> Following up on my previous email, I’ve found the issue occurs
> specifically with the |virtio-net| driver in KVM guests. Switching to
> the |e1000| driver resolves the slowdown entirely, with no throttling in
> subsequent downloads.
>
> The reproducer and observations remain the same, but this detail might
> help narrow down the issue.
Let's CC the virtio maintainers, then.
The fact that a 300ms sleep between connections makes the problem
go away is a bit odd from the networking perspective.
You may need to find a way to automate the test and try to bisect
it down :( This may help: https://github.com/arighi/virtme-ng
> > We've encountered a regression affecting downloads in KVM guests after
> > upgrading to Linux kernel 6.6. The issue is not present in kernel 5.15
> > or the stock Debian 6.6 kernel on hosts (not guests) but manifests
> > consistently in kernels 6.6 and later, including 6.6.58 and even 6.13-rc.
> >
> > Steps to Reproduce:
> > 1. Perform multiple sequential downloads, perhaps on a link with
> > higher BDP (USA -> EU 120ms in our case).
> > 2. Look at download speeds in scenarios with varying sleep intervals
> > between the downloads.
> >
> > Observations:
> > - Kernel 5.15: Reaches maximum throughput (~23 MB/s) consistently.
> > - Kernel 6.6:
> > - The first download achieves maximum throughput (~23 MB/s).
> > - Subsequent downloads are throttled to ~16 MB/s unless a sleep
> > interval ≥ 0.3 seconds is introduced between them.
> >
> > Reproducer Script:
> > for _ in 1 2; do curl http://example.com/1000MB.bin --max-time 8 -o
> > /dev/null -w '(%{speed_download} B/s)\n'; sleep 0.1 ;done
> >
> >
> > Tried various sysctl settings, changing qdiscs, tcp congestion algo
> > (e.g. from bbr to cubic), but the problem persists.
> >
> > git bisect traced the regression to commit dfa2f0483360 ("tcp: get rid
> > of sysctl_tcp_adv_win_scale"). While a similar issue described by
> > Netflix in
> > https://netflixtechblog.com/investigation-of-a-cross-regional-network-performance-issue-422d6218fdf1
> > and was supposedly fixed in kernels 6.6.33 and 6.10, the problem
> > remains in 6.6.58 and even 6.13-rc for our case.
> >
> > Could this behavior be a side effect of `tcp_adv_win_scale` removal,
> > or is it indicative of something else?
> >
> > We would appreciate any insights or guidance how to further
> > investigate this regression.
> >
> > Best regards!
> >
Powered by blists - more mailing lists