[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFYr1XPb=J0qeGt0Tco1z7QURmBH8TiWP0=uH0zhU=wCQKCtpA@mail.gmail.com>
Date: Mon, 12 May 2025 13:43:34 -0400
From: Anup Agarwal <anupa@...rew.cmu.edu>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: netdev@...r.kernel.org
Subject: Potential bug in Linux TCP vegas implementation
Hi Neal,
I am reaching out to you since you are listed as a point of contact
for Linux TCP (https://docs.kernel.org/process/maintainers.html) and
http://neal.nu/uw/linux-vegas/ seems to indicate that you also wrote
the initial Vegas implementation in Linux kernel.
I believe this commit
https://github.com/torvalds/linux/commit/8d3a564da34e5844aca4f991b73f8ca512246b23
introduced a bug in Vegas implementation.
Before this commit, the implementation compares "diff = cwnd * (RTT -
baseRTT) / RTT" with alpha_pkts. However, after this commit, diff is
changed to "diff = cwnd * (RTT - baseRTT) / baseRTT". This small
change in denominator potentially changes Vegas's steady-state
performance properties.
Specifically, before the commit, Vegas's steady-state rate is "rate =
alpha_pkts / delay", by substituting rate = cwnd/RTT and delay = RTT -
baseRTT in the equation "diff = alpha_pkts" (i.e., when flows do not
have incentive to change cwnd). After the commit, we get "rate =
alpha_pkts/delay * baseRTT/RTT". When baseRTT is small this is close
to "rate = alpha_pkts / delay^2".
"rate = alpha_pkts / delay" is the key to ensuring weighted
proportional fairness which Vegas has been analyzed to ensure (e.g.,
in https://www.cs.princeton.edu/techreports/2000/628.pdf or
https://link.springer.com/book/10.1007/978-0-8176-8216-3).
"rate = alpha_pkts/delay^2" would not give proportional fairness. For
instance on a parking lot topology, proportional fairness corresponds
to a throughput ratio of O(hops), whereas the delay^2 relation gives a
throughput ratio of O(hops^2) (derived in
https://arxiv.org/abs/2504.18786).
In practice, this issue or fixing it is perhaps not as important
because of the 3 reasons below. However, since this seems to be a
clear algebraic manipulation mistake in the commit and is an easy fix,
the issue can perhaps be fixed nonetheless. Please let me know in case
I missed something and this was instead an intentional change.
(R1) Few people (outside of perhaps congestion control evaluation) use Vegas.
(R2) To trigger this issue, one needs both low baseRTT and low
capacity (to ensure delay is large enough to matter (see R3 below)).
This implies low BDP networks at which point cwnd clamps may kick in.
Alternatively, large alpha_pkts value could trigger the issue instead
of low capacity.
(R3) In my empirical tests, I start seeing issues due to RTprop
(baseRTT) misestimation long before this issue.
Best,
Anup
Powered by blists - more mailing lists