[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ca7ad52-bfd6-4619-abef-93bf5be3969c@linux.dev>
Date: Wed, 3 Sep 2025 15:54:41 +0800
From: luoxuanqiang <xuanqiang.luo@...ux.dev>
To: Kuniyuki Iwashima <kuniyu@...gle.com>
Cc: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org,
kernelxing@...cent.com, netdev@...r.kernel.org,
Xuanqiang Luo <luoxuanqiang@...inos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk
在 2025/9/3 13:48, Kuniyuki Iwashima 写道:
> On Tue, Sep 2, 2025 at 10:16 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>> On Tue, Sep 2, 2025 at 7:45 PM Xuanqiang Luo <xuanqiang.luo@...ux.dev> wrote:
>>> From: Xuanqiang Luo <luoxuanqiang@...inos.cn>
>>>
>>> Since the lookup of sk in ehash is lockless, when one CPU is performing a
>>> lookup while another CPU is executing delete and insert operations
>>> (deleting reqsk and inserting sk), the lookup CPU may miss either of
>>> them, if sk cannot be found, an RST may be sent.
>>>
>>> The call trace map is drawn as follows:
>>> CPU 0 CPU 1
>>> ----- -----
>>> spin_lock()
>>> sk_nulls_del_node_init_rcu(osk)
>>> __inet_lookup_established()
>>> __sk_nulls_add_node_rcu(sk, list)
>>> spin_unlock()
>> This usually does not happen except for local communication, and
>> retrying on the client side is much better than penalising all lookups
>> for SYN.
>>
>>> We can try using spin_lock()/spin_unlock() to wait for ehash updates
>>> (ensuring all deletions and insertions are completed) after a failed
>>> lookup in ehash, then lookup sk again after the update. Since the sk
>>> expected to be found is unlikely to encounter the aforementioned scenario
>>> multiple times consecutively, we only need one update.
>>>
>>> Similarly, an issue occurs in tw hashdance. Try adjusting the order in
>>> which it operates on ehash: remove sk first, then add tw. If sk is missed
>>> during lookup, it will likewise wait for the update to find tw, without
>>> worrying about the skc_refcnt issue that would arise if tw were found
>>> first.
>>>
>>> Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls")
>>> Signed-off-by: Xuanqiang Luo <luoxuanqiang@...inos.cn>
>>> ---
>>> net/ipv4/inet_hashtables.c | 12 ++++++++++++
>>> net/ipv4/inet_timewait_sock.c | 9 ++++-----
>>> 2 files changed, 16 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>>> index ceeeec9b7290..4eb3a55b855b 100644
>>> --- a/net/ipv4/inet_hashtables.c
>>> +++ b/net/ipv4/inet_hashtables.c
>>> @@ -505,6 +505,7 @@ struct sock *__inet_lookup_established(const struct net *net,
>>> unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
>>> unsigned int slot = hash & hashinfo->ehash_mask;
>>> struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
>>> + bool try_lock = true;
>>>
>>> begin:
>>> sk_nulls_for_each_rcu(sk, node, &head->chain) {
>>> @@ -528,6 +529,17 @@ struct sock *__inet_lookup_established(const struct net *net,
>>> */
>>> if (get_nulls_value(node) != slot)
>>> goto begin;
>>> +
>>> + if (try_lock) {
>>> + spinlock_t *lock = inet_ehash_lockp(hashinfo, hash);
>>> +
>>> + try_lock = false;
>>> + spin_lock(lock);
>>> + /* Ensure ehash ops under spinlock complete. */
>>> + spin_unlock(lock);
>>> + goto begin;
>>> + }
>>> +
>>> out:
>>> sk = NULL;
>>> found:
>>> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
>>> index 875ff923a8ed..a91e02e19c53 100644
>>> --- a/net/ipv4/inet_timewait_sock.c
>>> +++ b/net/ipv4/inet_timewait_sock.c
>>> @@ -139,14 +139,10 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
>>>
>>> spin_lock(lock);
>>>
>>> - /* Step 2: Hash TW into tcp ehash chain */
>>> - inet_twsk_add_node_rcu(tw, &ehead->chain);
>> You are adding a new RST scenario where the corresponding
>> socket is not found and a listener or no socket is found.
>>
>> The try_lock part is not guaranteed to happen after twsk
>> insertion below.
I'm sorry, I didn't get it. If try_lock can search again, other places
should have left the spinlock critical section. That means twsk should
have been inserted already. Or maybe I missed some details. Could you
please explain more clearly? Your guidance would be really helpful. :)
> Oh no, spin_lock() dance sychronises the threads but I still
> think this is rather harmful for normal cases; now sending
> an unmatched packet can trigger lock dance, which is easily
> abused for DDoS.
>
I agree this scenario is possible. It does make DDoS attacks more
impactful. Thanks for pointing that out. However, this is the best
approach I can think of right now. Thanks Xuanqiang
>>
>>> -
>>> - /* Step 3: Remove SK from hash chain */
>>> + /* Step 2: Remove SK from hash chain */
>>> if (__sk_nulls_del_node_init_rcu(sk))
>>> sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>>>
>>> -
>>> /* Ensure above writes are committed into memory before updating the
>>> * refcount.
>>> * Provides ordering vs later refcount_inc().
>>> @@ -161,6 +157,9 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
>>> */
>>> refcount_set(&tw->tw_refcnt, 3);
>>>
>>> + /* Step 3: Hash TW into tcp ehash chain */
>>> + inet_twsk_add_node_rcu(tw, &ehead->chain);
>>> +
>>> inet_twsk_schedule(tw, timeo);
>>>
>>> spin_unlock(lock);
>>> --
>>> 2.25.1
>>>
Powered by blists - more mailing lists