[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1137fcf-1778-4811-9211-4beb95db3a32@uni-osnabrueck.de>
Date: Tue, 3 Feb 2026 09:36:24 +0100
From: Kathrin Elmenhorst <kelmenhorst@...-osnabrueck.de>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Kuniyuki Iwashima
<kuniyu@...gle.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: tcp_bbr: use high pacing gain when the
sender fails to put enough data inflight
> AFAICT this patch does not look like a safe solution, because there
> are many reasons that the actual number of bytes in flight can be less
> than the target inflight. Notably, it is very common for applications
> to be application-limited, i.e., they don't have enough data to send
> to fully utilize the BDP of the network path. This is very common for
> the most common kinds of TCP workloads: web, streaming video, RPC,
> SSH, etc. It does not seem safe to increase the pacing gain to
> bbr_high_gain in these common application-limited scenarios, because
> this can cause bursts of data to arrive at the bottleneck link at more
> than twice the available bandwidth, which can cause very high queuing
> and packet loss.
Absolutely, app-limited and contention-limited sockets exhibit similar
characteristics in terms of inflight bytes. My understanding was that
BBR uses high pacing gain when the socket is app-limited anyways, is
that not correct?
For example, bbr_check_full_bw_reached returns early if the socket is
app-limited, so that BBR stays in (or is reset to) STARTUP mode with
high pacing gain. This behavior actually gave me the idea to fix the
contention-limited socket in a similar way. But maybe I am missing
other cases where BBR reacts differently to app-limitation.
> But if you have further results that you can share, I'd appreciate it.
I separately sent you the arXiv link to our paper. Due to a conference
submission policy, we are not allowed to advertise it on a public
mailing list yet.
Thanks,
Kathrin
On 2/2/26 17:22, Neal Cardwell wrote:
> On Mon, Feb 2, 2026 at 5:15 AM kelmenhorst
> <kelmenhorst@...-osnabrueck.de> wrote:
>> This patch addresses a robustness issue of TCP BBR when run in Virtual Machines
>> - a common use case considering modern web hosting.
>> Prior experiments in AWS VMs (https://ieeexplore.ieee.org/abstract/document/9546441),
>> and our recent measurements in a controlled setup show that VM-based BBR senders
>> heavily underestimate the available bandwidth during periods of CPU contention.
>> In our configuration with 10ms periodic timeslices, BBR already degrades in VMs
>> with less than 70% CPU time share, and the throughput continuously drops with
>> decreasing CPU shares until it is bandwidth-independently capped at 10 Mbps for
>> CPU time shares at 35% and lower.
>> Considering the common use case of BBR in (resource-limited) Linux VMs, this
>> issue could potentially compromise the robustness of the Internet's transport
>> layer.
>>
>> In contrast to Cubic, BBR is very sensitive to off-CPU times. This is because
>> pacing evenly spreads the target inflight over the RTT, essentially assuming
>> that the CPU is available during the whole RTT. If this assumption is not met,
>> the BBR sender cannot achieve the target pacing rate and concludes that the full
>> bandwidth has been reached, even though the throughput is far below the actual
>> bandwidth limit.
> Thanks, I agree this is an issue worth working on. And I appreciate
> your contributions in this area, as well as the work of others who
> have worked on the challenges of BBR throughput in VM guests.
>
>> This commit detects the problematic condition in bbr_update_gains() by
>> comparing BBR's current target inflight (bbr_inflight() * tp->mss_cache) with
>> the actual number of bytes inflight (tp->bytes_sent - tp->bytes_acked), and
>> applies high pacing gain (bbr_high_gain), until the inflight deficit recovers.
>> With a higher pacing gain, BBR can send faster when the VM does have the CPU,
>> so that the target inflight volume can be achieved despite off-CPU times.
>> Re-using the constant STARTUP gain can only solve the issue up to a certain
>> point, but avoids complex algorithm changes.
>> Effectively, the patch solves the degradation problem for most critical cases:
>> with 10ms periodic timeslices, BBRv1 is robust for CPU time shares of 35% and
>> higher - instead of 70% and higher when using the original code.
> AFAICT this patch does not look like a safe solution, because there
> are many reasons that the actual number of bytes in flight can be less
> than the target inflight. Notably, it is very common for applications
> to be application-limited, i.e., they don't have enough data to send
> to fully utilize the BDP of the network path. This is very common for
> the most common kinds of TCP workloads: web, streaming video, RPC,
> SSH, etc. It does not seem safe to increase the pacing gain to
> bbr_high_gain in these common application-limited scenarios, because
> this can cause bursts of data to arrive at the bottleneck link at more
> than twice the available bandwidth, which can cause very high queuing
> and packet loss.
>
> On a broader note, before making such a significant change to a CC
> algorithm, we would need to see a broad array of test results showing
> that the change is safe and performant in a representative set of
> scenarios: multiple BBR flows, shallow buffers, deep buffers,
> policers, application-limited traffic, BBR coexistence with CUBIC,
> performance on wifi paths, etc.
>
>> We can share further results of our measurement study upon individual request.
> Yes, if you could please share further results, I would appreciate it.
> I'm aware of your work here:
>
> POSTER: Clouded Comparisons - On the Impact of VirtualMachines on
> TCP-BBR Performance
> https://dl.acm.org/doi/epdf/10.1145/3744969.3748455
>
> But if you have further results that you can share, I'd appreciate it.
>
>> Signed-off-by: Kathrin Elmenhorst <kelmenhorst@...-osnabrueck.de>
>> ---
>> net/ipv4/tcp_bbr.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
>> index 760941e55153..ca6361931491 100644
>> --- a/net/ipv4/tcp_bbr.c
>> +++ b/net/ipv4/tcp_bbr.c
>> @@ -1011,6 +1011,13 @@ static void bbr_update_gains(struct sock *sk)
>> WARN_ONCE(1, "BBR bad mode: %u\n", bbr->mode);
>> break;
>> }
>> + // overwrite pacing gain in case the sender fails to put enough data inflight
>> + struct tcp_sock *tp = tcp_sk(sk);
>> + u64 real_inflight = tp->bytes_sent - tp->bytes_acked;
>> + u32 target_inflight = bbr_inflight(sk, bbr_bw(sk), BBR_UNIT) * tp->mss_cache;
>> + if (real_inflight < target_inflight) {
>> + bbr->pacing_gain = bbr_high_gain;
>> + }
>> }
> Just a note on style: before your next patch submission to the Linux
> netdev mailing list:
>
> (1) please read the netdev FAQ at:
> https://www.kernel.org/doc/html/v6.18/process/maintainer-netdev.html#netdev-faq
>
> (2) please run ./scripts/checkpatch.pl on the patch to catch and fix
> style issues (./scripts/checkpatch.pl shows: "total: 7 errors, 10
> warnings, 0 checks, 13 lines checked")
>
> Thanks!
>
> best regards,
> neal
Powered by blists - more mailing lists