[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUCKDi0aZcraeZaMY4ebuoBoB_Ymdy1RGb1247JznArTJg@mail.gmail.com>
Date: Tue, 2 Sep 2025 22:16:22 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Xuanqiang Luo <xuanqiang.luo@...ux.dev>
Cc: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org,
kernelxing@...cent.com, netdev@...r.kernel.org,
Xuanqiang Luo <luoxuanqiang@...inos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk
On Tue, Sep 2, 2025 at 7:45 PM Xuanqiang Luo <xuanqiang.luo@...ux.dev> wrote:
>
> From: Xuanqiang Luo <luoxuanqiang@...inos.cn>
>
> Since the lookup of sk in ehash is lockless, when one CPU is performing a
> lookup while another CPU is executing delete and insert operations
> (deleting reqsk and inserting sk), the lookup CPU may miss either of
> them, if sk cannot be found, an RST may be sent.
>
> The call trace map is drawn as follows:
> CPU 0 CPU 1
> ----- -----
> spin_lock()
> sk_nulls_del_node_init_rcu(osk)
> __inet_lookup_established()
> __sk_nulls_add_node_rcu(sk, list)
> spin_unlock()
This usually does not happen except for local communication, and
retrying on the client side is much better than penalising all lookups
for SYN.
>
> We can try using spin_lock()/spin_unlock() to wait for ehash updates
> (ensuring all deletions and insertions are completed) after a failed
> lookup in ehash, then lookup sk again after the update. Since the sk
> expected to be found is unlikely to encounter the aforementioned scenario
> multiple times consecutively, we only need one update.
>
> Similarly, an issue occurs in tw hashdance. Try adjusting the order in
> which it operates on ehash: remove sk first, then add tw. If sk is missed
> during lookup, it will likewise wait for the update to find tw, without
> worrying about the skc_refcnt issue that would arise if tw were found
> first.
>
> Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls")
> Signed-off-by: Xuanqiang Luo <luoxuanqiang@...inos.cn>
> ---
> net/ipv4/inet_hashtables.c | 12 ++++++++++++
> net/ipv4/inet_timewait_sock.c | 9 ++++-----
> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index ceeeec9b7290..4eb3a55b855b 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -505,6 +505,7 @@ struct sock *__inet_lookup_established(const struct net *net,
> unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
> unsigned int slot = hash & hashinfo->ehash_mask;
> struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
> + bool try_lock = true;
>
> begin:
> sk_nulls_for_each_rcu(sk, node, &head->chain) {
> @@ -528,6 +529,17 @@ struct sock *__inet_lookup_established(const struct net *net,
> */
> if (get_nulls_value(node) != slot)
> goto begin;
> +
> + if (try_lock) {
> + spinlock_t *lock = inet_ehash_lockp(hashinfo, hash);
> +
> + try_lock = false;
> + spin_lock(lock);
> + /* Ensure ehash ops under spinlock complete. */
> + spin_unlock(lock);
> + goto begin;
> + }
> +
> out:
> sk = NULL;
> found:
> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> index 875ff923a8ed..a91e02e19c53 100644
> --- a/net/ipv4/inet_timewait_sock.c
> +++ b/net/ipv4/inet_timewait_sock.c
> @@ -139,14 +139,10 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
>
> spin_lock(lock);
>
> - /* Step 2: Hash TW into tcp ehash chain */
> - inet_twsk_add_node_rcu(tw, &ehead->chain);
You are adding a new RST scenario where the corresponding
socket is not found and a listener or no socket is found.
The try_lock part is not guaranteed to happen after twsk
insertion below.
> -
> - /* Step 3: Remove SK from hash chain */
> + /* Step 2: Remove SK from hash chain */
> if (__sk_nulls_del_node_init_rcu(sk))
> sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>
> -
> /* Ensure above writes are committed into memory before updating the
> * refcount.
> * Provides ordering vs later refcount_inc().
> @@ -161,6 +157,9 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
> */
> refcount_set(&tw->tw_refcnt, 3);
>
> + /* Step 3: Hash TW into tcp ehash chain */
> + inet_twsk_add_node_rcu(tw, &ehead->chain);
> +
> inet_twsk_schedule(tw, timeo);
>
> spin_unlock(lock);
> --
> 2.25.1
>
Powered by blists - more mailing lists