[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <281edb3a-4679-4c75-9192-a5f0ef6952ea@kernel.org>
Date: Thu, 6 Mar 2025 10:55:32 +0100
From: Matthieu Baerts <matttbe@...nel.org>
To: Eric Dumazet <edumazet@...gle.com>, Jason Xing <kerneljasonxing@...il.com>
Cc: mptcp@...ts.linux.dev, Neal Cardwell <ncardwell@...gle.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>, "David S. Miller"
<davem@...emloft.net>, David Ahern <dsahern@...nel.org>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next] tcp: clamp window like before the cleanup
Hi Eric,
On 06/03/2025 10:45, Eric Dumazet wrote:
> On Thu, Mar 6, 2025 at 6:22 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>>
>> On Wed, Mar 5, 2025 at 10:49 PM Matthieu Baerts (NGI0)
>> <matttbe@...nel.org> wrote:
>>>
>>> A recent cleanup changed the behaviour of tcp_set_window_clamp(). This
>>> looks unintentional, and affects MPTCP selftests, e.g. some tests
>>> re-establishing a connection after a disconnect are now unstable.
>>>
>>> Before the cleanup, this operation was done:
>>>
>>> new_rcv_ssthresh = min(tp->rcv_wnd, new_window_clamp);
>>> tp->rcv_ssthresh = max(new_rcv_ssthresh, tp->rcv_ssthresh);
>>>
>>> The cleanup used the 'clamp' macro which takes 3 arguments -- value,
>>> lowest, and highest -- and returns a value between the lowest and the
>>> highest allowable values. This then assumes ...
>>>
>>> lowest (rcv_ssthresh) <= highest (rcv_wnd)
>>>
>>> ... which doesn't seem to be always the case here according to the MPTCP
>>> selftests, even when running them without MPTCP, but only TCP.
>>>
>>> For example, when we have ...
>>>
>>> rcv_wnd < rcv_ssthresh < new_rcv_ssthresh
>>>
>>> ... before the cleanup, the rcv_ssthresh was not changed, while after
>>> the cleanup, it is lowered down to rcv_wnd (highest).
>>>
>>> During a simple test with TCP, here are the values I observed:
>>>
>>> new_window_clamp (val) rcv_ssthresh (lo) rcv_wnd (hi)
>>> 117760 (out) 65495 < 65536
>>> 128512 (out) 109595 > 80256 => lo > hi
>>> 1184975 (out) 328987 < 329088
>>>
>>> 113664 (out) 65483 < 65536
>>> 117760 (out) 110968 < 110976
>>> 129024 (out) 116527 > 109696 => lo > hi
>>>
>>> Here, we can see that it is not that rare to have rcv_ssthresh (lo)
>>> higher than rcv_wnd (hi), so having a different behaviour when the
>>> clamp() macro is used, even without MPTCP.
>>>
>>> Note: new_window_clamp is always out of range (rcv_ssthresh < rcv_wnd)
>>> here, which seems to be generally the case in my tests with small
>>> connections.
>>>
>>> I then suggests reverting this part, not to change the behaviour.
>>>
>>> Fixes: 863a952eb79a ("tcp: tcp_set_window_clamp() cleanup")
>>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/551
>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@...nel.org>
>>
>> Tested-by: Jason Xing <kerneljasonxing@...il.com>
>>
>> Thanks for catching this. I should have done more tests :(
>>
>> Now I use netperf with TCP_CRR to test loopback and easily see the
>> case where tp->rcv_ssthresh is larger than tp->rcv_wnd, which means
>> tp->rcv_wnd is not the upper bound as you said.
>>
>> Thanks,
>> Jason
>>
>
> Patch looks fine to me but all our tests are passing with the current kernel,
> and I was not able to trigger the condition.
Thank you for having looked at this patch!
> Can you share what precise test you did ?
To be able to get a situation where "rcv_ssthresh > rcv_wnd", I simply
executed MPTCP Connect selftest. You can also force creating TCP only
connections with '-tt', e.g.
./mptcp_connect.sh -tt
To be able to reproduce the issue with the selftests mentioned in [1], I
simply executed ./mptcp_connect.sh in a loop after having applied this
small patch to execute only a part of the subtests ("disconnect"):
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> index 5e3c56253274..d8ebea5abc6c 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> @@ -855,6 +855,7 @@ make_file "$sin" "server"
>
> mptcp_lib_subtests_last_ts_reset
>
> +if false; then
> check_mptcp_disabled
>
> stop_if_error "The kernel configuration is not valid for MPTCP"
> @@ -882,6 +883,7 @@ mptcp_lib_result_code "${ret}" "ping tests"
>
> stop_if_error "Could not even run ping tests"
> mptcp_lib_pr_ok
> +fi
>
> [ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss delay ${tc_delay}ms
> tc_info="loss of $tc_loss "
> @@ -910,6 +912,7 @@ mptcp_lib_pr_info "Using ${tc_info}on ns3eth4"
>
> tc -net "$ns3" qdisc add dev ns3eth4 root netem delay ${reorder_delay}ms $tc_reorder
>
> +if false; then
> TEST_GROUP="loopback v4"
> run_tests_lo "$ns1" "$ns1" 10.0.1.1 1
> stop_if_error "Could not even run loopback test"
> @@ -959,6 +962,7 @@ log_if_error "Tests with MPTFO have failed"
> run_test_transparent 10.0.3.1 "tproxy ipv4"
> run_test_transparent dead:beef:3::1 "tproxy ipv6"
> log_if_error "Tests with tproxy have failed"
> +fi
>
> run_tests_disconnect
> log_if_error "Tests of the full disconnection have failed"
Note that our CI was able to easily reproduce it. Locally, it was taking
around 30 to 50 iterations to reproduce the issue.
[1] https://github.com/multipath-tcp/mptcp_net-next/issues/551
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Powered by blists - more mailing lists