lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKVQ=c8zxm0MqR7ycR1RFbKqObEPEJrpWCfxH4MdVf3Og@mail.gmail.com>
Date: Thu, 28 Aug 2025 12:43:36 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: dima@...sta.com
Cc: Neal Cardwell <ncardwell@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, 
	"David S. Miller" <davem@...emloft.net>, David Ahern <dsahern@...nel.org>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Bob Gilligan <gilligan@...sta.com>, Salam Noureddine <noureddine@...sta.com>, 
	Dmitry Safonov <0x7f454c46@...il.com>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v2 1/2] tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()

On Thu, Aug 28, 2025 at 1:15 AM Dmitry Safonov via B4 Relay
<devnull+dima.arista.com@...nel.org> wrote:
>
> From: Dmitry Safonov <dima@...sta.com>
>
> Currently there are a couple of minor issues with destroying the keys
> tcp_v4_destroy_sock():
>
> 1. The socket is yet in TCP bind buckets, making it reachable for
>    incoming segments [on another CPU core], potentially available to send
>    late FIN/ACK/RST replies.
>
> 2. There is at least one code path, where tcp_done() is called before
>    sending RST [kudos to Bob for investigation]. This is a case of
>    a server, that finished sending its data and just called close().
>
>    The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
>    __tcp_close())
>
>    tcp_v4_do_rcv()/tcp_v6_do_rcv()
>      tcp_rcv_state_process()            /* LINUX_MIB_TCPABORTONDATA */
>        tcp_reset()
>          tcp_done_with_error()
>            tcp_done()
>              inet_csk_destroy_sock()    /* Destroys AO/MD5 keys */
>      /* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
>    tcp_v4_send_reset()                  /* Sends an unsigned RST segment */
>
>    tcpdump:
> > 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
> >     1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
> > 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
> >     1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
> > 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
> >     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
> > 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> >     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
> > 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> >     1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
>
>   Note the signed RSTs later in the dump - those are sent by the server
>   when the fin-wait socket gets removed from hash buckets, by
>   the listener socket.
>
> Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
> slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
> socket can yet send non-data replies, they should be signed in order for
> the peer to process them. Now it also matches how AO/MD5 gets destructed
> for TIME-WAIT sockets (in tcp_twsk_destructor()).
>
> This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
> open problem: once RST get sent and socket gets actually destructed,
> there is no information on the initial sequence numbers. So, in case
> this last RST gets lost in the network, the server's listener socket
> won't be able to properly sign another RST. Nothing in RFC 1122
> prescribes keeping any local state after non-graceful reset.
> Luckily, BGP are known to use keep alive(s).
>
> While the issue is quite minor/cosmetic, these days monitoring network
> counters is a common practice and getting invalid signed segments from
> a trusted BGP peer can get customers worried.
>
> Investigated-by: Bob Gilligan <gilligan@...sta.com>
> Signed-off-by: Dmitry Safonov <dima@...sta.com>
> ---
>  include/net/tcp.h   |  4 ++++
>  net/ipv4/tcp.c      | 27 +++++++++++++++++++++++++++
>  net/ipv4/tcp_ipv4.c | 33 ++++++++-------------------------
>  net/ipv6/tcp_ipv6.c |  8 ++++++++
>  4 files changed, 47 insertions(+), 25 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2936b8175950faa777f81f3c6b7230bcc375d772..0009c26241964b54aa93bc1b86158050d96c2c98 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1931,6 +1931,7 @@ tcp_md5_do_lookup_any_l3index(const struct sock *sk,
>  }
>
>  #define tcp_twsk_md5_key(twsk) ((twsk)->tw_md5_key)
> +void tcp_md5_destruct_sock(struct sock *sk);
>  #else
>  static inline struct tcp_md5sig_key *
>  tcp_md5_do_lookup(const struct sock *sk, int l3index,
> @@ -1947,6 +1948,9 @@ tcp_md5_do_lookup_any_l3index(const struct sock *sk,
>  }
>
>  #define tcp_twsk_md5_key(twsk) NULL
> +static inline void tcp_md5_destruct_sock(struct sock *sk)
> +{
> +}
>  #endif
>
>  int tcp_md5_alloc_sigpool(void);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9bc8317e92b7952871f07ae11a9c2eaa7d3a9e65..927233ee7500e0568782ae4a3860af56d1476acd 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -412,6 +412,33 @@ static u64 tcp_compute_delivery_rate(const struct tcp_sock *tp)
>         return rate64;
>  }
>
> +#ifdef CONFIG_TCP_MD5SIG
> +static void tcp_md5sig_info_free_rcu(struct rcu_head *head)
> +{
> +       struct tcp_md5sig_info *md5sig;
> +
> +       md5sig = container_of(head, struct tcp_md5sig_info, rcu);
> +       kfree(md5sig);
> +       static_branch_slow_dec_deferred(&tcp_md5_needed);
> +       tcp_md5_release_sigpool();
> +}
> +
> +void tcp_md5_destruct_sock(struct sock *sk)
> +{
> +       struct tcp_sock *tp = tcp_sk(sk);
> +
> +       if (tp->md5sig_info) {
> +               struct tcp_md5sig_info *md5sig;
> +
> +               md5sig = rcu_dereference_protected(tp->md5sig_info, 1);
> +               tcp_clear_md5_list(sk);
> +               call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu);
> +               rcu_assign_pointer(tp->md5sig_info, NULL);

I would move this line before call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu),
otherwise the free could happen before the clear, and an UAF could occur.

It is not absolutely clear if this function runs under rcu_read_lock(),
and even if it is currently safe, this could change in the future.

Other than that :

Reviewed-by: Eric Dumazet <edumazet@...gle.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ