[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUDZFCYNMQ08uRLu388cmggckzPeP=N7WncFyxN-_hgaMw@mail.gmail.com>
Date: Tue, 2 Sep 2025 22:48:39 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Xuanqiang Luo <xuanqiang.luo@...ux.dev>
Cc: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org,
kernelxing@...cent.com, netdev@...r.kernel.org,
Xuanqiang Luo <luoxuanqiang@...inos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk
On Tue, Sep 2, 2025 at 10:16 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>
> On Tue, Sep 2, 2025 at 7:45 PM Xuanqiang Luo <xuanqiang.luo@...ux.dev> wrote:
> >
> > From: Xuanqiang Luo <luoxuanqiang@...inos.cn>
> >
> > Since the lookup of sk in ehash is lockless, when one CPU is performing a
> > lookup while another CPU is executing delete and insert operations
> > (deleting reqsk and inserting sk), the lookup CPU may miss either of
> > them, if sk cannot be found, an RST may be sent.
> >
> > The call trace map is drawn as follows:
> > CPU 0 CPU 1
> > ----- -----
> > spin_lock()
> > sk_nulls_del_node_init_rcu(osk)
> > __inet_lookup_established()
> > __sk_nulls_add_node_rcu(sk, list)
> > spin_unlock()
>
> This usually does not happen except for local communication, and
> retrying on the client side is much better than penalising all lookups
> for SYN.
>
> >
> > We can try using spin_lock()/spin_unlock() to wait for ehash updates
> > (ensuring all deletions and insertions are completed) after a failed
> > lookup in ehash, then lookup sk again after the update. Since the sk
> > expected to be found is unlikely to encounter the aforementioned scenario
> > multiple times consecutively, we only need one update.
> >
> > Similarly, an issue occurs in tw hashdance. Try adjusting the order in
> > which it operates on ehash: remove sk first, then add tw. If sk is missed
> > during lookup, it will likewise wait for the update to find tw, without
> > worrying about the skc_refcnt issue that would arise if tw were found
> > first.
> >
> > Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls")
> > Signed-off-by: Xuanqiang Luo <luoxuanqiang@...inos.cn>
> > ---
> > net/ipv4/inet_hashtables.c | 12 ++++++++++++
> > net/ipv4/inet_timewait_sock.c | 9 ++++-----
> > 2 files changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> > index ceeeec9b7290..4eb3a55b855b 100644
> > --- a/net/ipv4/inet_hashtables.c
> > +++ b/net/ipv4/inet_hashtables.c
> > @@ -505,6 +505,7 @@ struct sock *__inet_lookup_established(const struct net *net,
> > unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
> > unsigned int slot = hash & hashinfo->ehash_mask;
> > struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
> > + bool try_lock = true;
> >
> > begin:
> > sk_nulls_for_each_rcu(sk, node, &head->chain) {
> > @@ -528,6 +529,17 @@ struct sock *__inet_lookup_established(const struct net *net,
> > */
> > if (get_nulls_value(node) != slot)
> > goto begin;
> > +
> > + if (try_lock) {
> > + spinlock_t *lock = inet_ehash_lockp(hashinfo, hash);
> > +
> > + try_lock = false;
> > + spin_lock(lock);
> > + /* Ensure ehash ops under spinlock complete. */
> > + spin_unlock(lock);
> > + goto begin;
> > + }
> > +
> > out:
> > sk = NULL;
> > found:
> > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> > index 875ff923a8ed..a91e02e19c53 100644
> > --- a/net/ipv4/inet_timewait_sock.c
> > +++ b/net/ipv4/inet_timewait_sock.c
> > @@ -139,14 +139,10 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
> >
> > spin_lock(lock);
> >
> > - /* Step 2: Hash TW into tcp ehash chain */
> > - inet_twsk_add_node_rcu(tw, &ehead->chain);
>
> You are adding a new RST scenario where the corresponding
> socket is not found and a listener or no socket is found.
>
> The try_lock part is not guaranteed to happen after twsk
> insertion below.
Oh no, spin_lock() dance sychronises the threads but I still
think this is rather harmful for normal cases; now sending
an unmatched packet can trigger lock dance, which is easily
abused for DDoS.
>
>
> > -
> > - /* Step 3: Remove SK from hash chain */
> > + /* Step 2: Remove SK from hash chain */
> > if (__sk_nulls_del_node_init_rcu(sk))
> > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
> >
> > -
> > /* Ensure above writes are committed into memory before updating the
> > * refcount.
> > * Provides ordering vs later refcount_inc().
> > @@ -161,6 +157,9 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
> > */
> > refcount_set(&tw->tw_refcnt, 3);
> >
> > + /* Step 3: Hash TW into tcp ehash chain */
> > + inet_twsk_add_node_rcu(tw, &ehead->chain);
> > +
> > inet_twsk_schedule(tw, timeo);
> >
> > spin_unlock(lock);
> > --
> > 2.25.1
> >
Powered by blists - more mailing lists