netdev - Re: [PATCH net] inet: Avoid established lookup missing active sk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAVpQUCKDi0aZcraeZaMY4ebuoBoB_Ymdy1RGb1247JznArTJg@mail.gmail.com>
Date: Tue, 2 Sep 2025 22:16:22 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Xuanqiang Luo <xuanqiang.luo@...ux.dev>
Cc: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org, 
	kernelxing@...cent.com, netdev@...r.kernel.org, 
	Xuanqiang Luo <luoxuanqiang@...inos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk

On Tue, Sep 2, 2025 at 7:45 PM Xuanqiang Luo <xuanqiang.luo@...ux.dev> wrote:
>
> From: Xuanqiang Luo <luoxuanqiang@...inos.cn>
>
> Since the lookup of sk in ehash is lockless, when one CPU is performing a
> lookup while another CPU is executing delete and insert operations
> (deleting reqsk and inserting sk), the lookup CPU may miss either of
> them, if sk cannot be found, an RST may be sent.
>
> The call trace map is drawn as follows:
>    CPU 0                           CPU 1
>    -----                           -----
>                                 spin_lock()
>                                 sk_nulls_del_node_init_rcu(osk)
> __inet_lookup_established()
>                                 __sk_nulls_add_node_rcu(sk, list)
>                                 spin_unlock()

This usually does not happen except for local communication, and
retrying on the client side is much better than penalising all lookups
for SYN.

>
> We can try using spin_lock()/spin_unlock() to wait for ehash updates
> (ensuring all deletions and insertions are completed) after a failed
> lookup in ehash, then lookup sk again after the update. Since the sk
> expected to be found is unlikely to encounter the aforementioned scenario
> multiple times consecutively, we only need one update.
>
> Similarly, an issue occurs in tw hashdance. Try adjusting the order in
> which it operates on ehash: remove sk first, then add tw. If sk is missed
> during lookup, it will likewise wait for the update to find tw, without
> worrying about the skc_refcnt issue that would arise if tw were found
> first.
>
> Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls")
> Signed-off-by: Xuanqiang Luo <luoxuanqiang@...inos.cn>
> ---
>  net/ipv4/inet_hashtables.c    | 12 ++++++++++++
>  net/ipv4/inet_timewait_sock.c |  9 ++++-----
>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index ceeeec9b7290..4eb3a55b855b 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -505,6 +505,7 @@ struct sock *__inet_lookup_established(const struct net *net,
>         unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
>         unsigned int slot = hash & hashinfo->ehash_mask;
>         struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
> +       bool try_lock = true;
>
>  begin:
>         sk_nulls_for_each_rcu(sk, node, &head->chain) {
> @@ -528,6 +529,17 @@ struct sock *__inet_lookup_established(const struct net *net,
>          */
>         if (get_nulls_value(node) != slot)
>                 goto begin;
> +
> +       if (try_lock) {
> +               spinlock_t *lock = inet_ehash_lockp(hashinfo, hash);
> +
> +               try_lock = false;
> +               spin_lock(lock);
> +               /* Ensure ehash ops under spinlock complete. */
> +               spin_unlock(lock);
> +               goto begin;
> +       }
> +
>  out:
>         sk = NULL;
>  found:
> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> index 875ff923a8ed..a91e02e19c53 100644
> --- a/net/ipv4/inet_timewait_sock.c
> +++ b/net/ipv4/inet_timewait_sock.c
> @@ -139,14 +139,10 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
>
>         spin_lock(lock);
>
> -       /* Step 2: Hash TW into tcp ehash chain */
> -       inet_twsk_add_node_rcu(tw, &ehead->chain);

You are adding a new RST scenario where the corresponding
socket is not found and a listener or no socket is found.

The try_lock part is not guaranteed to happen after twsk
insertion below.


> -
> -       /* Step 3: Remove SK from hash chain */
> +       /* Step 2: Remove SK from hash chain */
>         if (__sk_nulls_del_node_init_rcu(sk))
>                 sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>
> -
>         /* Ensure above writes are committed into memory before updating the
>          * refcount.
>          * Provides ordering vs later refcount_inc().
> @@ -161,6 +157,9 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
>          */
>         refcount_set(&tw->tw_refcnt, 3);
>
> +       /* Step 3: Hash TW into tcp ehash chain */
> +       inet_twsk_add_node_rcu(tw, &ehead->chain);
> +
>         inet_twsk_schedule(tw, timeo);
>
>         spin_unlock(lock);
> --
> 2.25.1
>