[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b6a40fe0-09da-4b84-aa8d-0dcebf15d822@linux.dev>
Date: Wed, 3 Sep 2025 19:51:07 +0800
From: luoxuanqiang <xuanqiang.luo@...ux.dev>
To: Eric Dumazet <edumazet@...gle.com>
Cc: kuniyu@...gle.com, davem@...emloft.net, kuba@...nel.org,
kernelxing@...cent.com, netdev@...r.kernel.org,
Xuanqiang Luo <luoxuanqiang@...inos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk
在 2025/9/3 14:40, Eric Dumazet 写道:
> On Tue, Sep 2, 2025 at 7:46 PM Xuanqiang Luo <xuanqiang.luo@...ux.dev> wrote:
>> From: Xuanqiang Luo <luoxuanqiang@...inos.cn>
>>
>> Since the lookup of sk in ehash is lockless, when one CPU is performing a
>> lookup while another CPU is executing delete and insert operations
>> (deleting reqsk and inserting sk), the lookup CPU may miss either of
>> them, if sk cannot be found, an RST may be sent.
>>
>> The call trace map is drawn as follows:
>> CPU 0 CPU 1
>> ----- -----
>> spin_lock()
>> sk_nulls_del_node_init_rcu(osk)
>> __inet_lookup_established()
>> __sk_nulls_add_node_rcu(sk, list)
>> spin_unlock()
>>
>> We can try using spin_lock()/spin_unlock() to wait for ehash updates
>> (ensuring all deletions and insertions are completed) after a failed
>> lookup in ehash, then lookup sk again after the update. Since the sk
>> expected to be found is unlikely to encounter the aforementioned scenario
>> multiple times consecutively, we only need one update.
> No need for a lock really...
> - add the new node (with a temporary 'wrong' nulls value),
> - delete the old node
> - replace the nulls value by the expected one.
>
Hi Eric, May I ask if your main purpose is to add a temporary 'wrong'
nulls to trigger a re-lookup in the lookup, until the real new sk is
successfully replaced, like the following code (rough code)?
Thanks
Xuanqiang
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -526,6 +526,8 @@ struct sock *__inet_lookup_established(const struct net *net,
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
+
+ /* Here, temporary NULL values cause a re-lookup. */
if (get_nulls_value(node) != slot)
goto begin;
out:
@@ -684,7 +686,34 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk)
spin_lock(lock);
if (osk) {
+ struct hlist_nulls_node *i, *last = NULL, *n = &sk->sk_nulls_node;
WARN_ON_ONCE(sk->sk_hash != osk->sk_hash);
+
+ /* find the last node in the list */
+ for (i = list->first; !is_a_nulls(i); i = i->next)
+ last = i;
+
+ if (last) {
+ /* save the original nulls values */
+ i = last->next;
+
+ /* 1. add the temporary 'wrong' nulls value */
+ rcu_assign_pointer(hlist_nulls_next_rcu(last),
+ (struct hlist_nulls_node *)NULLS_MARKER(NULL));
+
+ /* 2. delete the old node */
+ ret = sk_nulls_del_node_init_rcu(osk);
+
+ /* 3. add the new node */
+ if (ret) {
+ WRITE_ONCE(n->next, i);
+ n->pprev = &last->next;
+ rcu_assign_pointer(hlist_nulls_next_rcu(last), n);
+ } else {
+ rcu_assign_pointer(hlist_nulls_next_rcu(last), i);
+ }
+ goto unlock;
+ }
ret = sk_nulls_del_node_init_rcu(osk);
} else if (found_dup_sk) {
*found_dup_sk = inet_ehash_lookup_by_sk(sk, list);
@@ -695,6 +724,7 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk)
if (ret)
__sk_nulls_add_node_rcu(sk, list);
+unlock:
spin_unlock(lock);
return ret;
Powered by blists - more mailing lists