[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <98c5d450-e766-45cd-a300-bbeaf31cb0b9@mojatatu.com>
Date: Fri, 22 Aug 2025 09:40:01 -0300
From: Victor Nogueira <victor@...atatu.com>
To: dima@...sta.com, Eric Dumazet <edumazet@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>,
"David S. Miller" <davem@...emloft.net>, David Ahern <dsahern@...nel.org>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>
Cc: Bob Gilligan <gilligan@...sta.com>,
Salam Noureddine <noureddine@...sta.com>,
Dmitry Safonov <0x7f454c46@...il.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next 1/2] tcp: Destroy TCP-AO, TCP-MD5 keys in
.sk_destruct()
On 8/22/25 01:55, Dmitry Safonov via B4 Relay wrote:
> From: Dmitry Safonov <dima@...sta.com>
>
> Currently there are a couple of minor issues with destroying the keys
> tcp_v4_destroy_sock():
>
> 1. The socket is yet in TCP bind buckets, making it reachable for
> incoming segments [on another CPU core], potentially available to send
> late FIN/ACK/RST replies.
>
> 2. There is at least one code path, where tcp_done() is called before
> sending RST [kudos to Bob for investigation]. This is a case of
> a server, that finished sending its data and just called close().
>
> The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
> __tcp_close())
>
> tcp_v4_do_rcv()/tcp_v6_do_rcv()
> tcp_rcv_state_process() /* LINUX_MIB_TCPABORTONDATA */
> tcp_reset()
> tcp_done_with_error()
> tcp_done()
> inet_csk_destroy_sock() /* Destroys AO/MD5 keys */
> /* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
> tcp_v4_send_reset() /* Sends an unsigned RST segment */
>
> tcpdump:
>> 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
>> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
>> 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
>> 1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
>> 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
>> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
>> 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
>> 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
>
> Note the signed RSTs later in the dump - those are sent by the server
> when the fin-wait socket gets removed from hash buckets, by
> the listener socket.
>
> Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
> slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
> socket can yet send non-data replies, they should be signed in order for
> the peer to process them. Now it also matches how AO/MD5 gets destructed
> for TIME-WAIT sockets (in tcp_twsk_destructor()).
>
> This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
> open problem: once RST get sent and socket gets actually destructed,
> there is no information on the initial sequence numbers. So, in case
> this last RST gets lost in the network, the server's listener socket
> won't be able to properly sign another RST. Nothing in RFC 1122
> prescribes keeping any local state after non-graceful reset.
> Luckily, BGP are known to use keep alive(s).
>
> While the issue is quite minor/cosmetic, these days monitoring network
> counters is a common practice and getting invalid signed segments from
> a trusted BGP peer can get customers worried.
>
> Investigated-by: Bob Gilligan <gilligan@...sta.com>
> Signed-off-by: Dmitry Safonov <dima@...sta.com>
> ---
> net/ipv4/tcp.c | 31 +++++++++++++++++++++++++++++++
> net/ipv4/tcp_ipv4.c | 25 -------------------------
> 2 files changed, 31 insertions(+), 25 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 71a956fbfc5533224ee00e792de2cfdccd4d40aa..4e996e937e8e5f0e75764caa24240e25006deece 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -412,6 +412,36 @@ static u64 tcp_compute_delivery_rate(const struct tcp_sock *tp)
> return rate64;
> }
> [...]
> +
> +static void tcp_destruct_sock(struct sock *sk)
> +{
> + struct tcp_sock *tp = tcp_sk(sk);
It looks like this variable is unused when CONFIG_TCP_MD5SIG is not set
and this is causing the test CI build to fail.
net/ipv4/tcp.c: In function ‘tcp_destruct_sock’:
net/ipv4/tcp.c:417:26: error: unused variable ‘tp’ [-Werror=unused-variable]
417 | struct tcp_sock *tp = tcp_sk(sk);
| ^~
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:287: net/ipv4/tcp.
cheers,
Victor
Powered by blists - more mailing lists