netdev - Re: [PATCH net] inet: Avoid established lookup missing active sk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+XH95h4UANWpR-39LSRkvM3LL=_pRL0+6fp6dwTZxn_g@mail.gmail.com>
Date: Tue, 2 Sep 2025 23:40:10 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Xuanqiang Luo <xuanqiang.luo@...ux.dev>
Cc: kuniyu@...gle.com, davem@...emloft.net, kuba@...nel.org, 
	kernelxing@...cent.com, netdev@...r.kernel.org, 
	Xuanqiang Luo <luoxuanqiang@...inos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk

On Tue, Sep 2, 2025 at 7:46 PM Xuanqiang Luo <xuanqiang.luo@...ux.dev> wrote:
>
> From: Xuanqiang Luo <luoxuanqiang@...inos.cn>
>
> Since the lookup of sk in ehash is lockless, when one CPU is performing a
> lookup while another CPU is executing delete and insert operations
> (deleting reqsk and inserting sk), the lookup CPU may miss either of
> them, if sk cannot be found, an RST may be sent.
>
> The call trace map is drawn as follows:
>    CPU 0                           CPU 1
>    -----                           -----
>                                 spin_lock()
>                                 sk_nulls_del_node_init_rcu(osk)
> __inet_lookup_established()
>                                 __sk_nulls_add_node_rcu(sk, list)
>                                 spin_unlock()
>
> We can try using spin_lock()/spin_unlock() to wait for ehash updates
> (ensuring all deletions and insertions are completed) after a failed
> lookup in ehash, then lookup sk again after the update. Since the sk
> expected to be found is unlikely to encounter the aforementioned scenario
> multiple times consecutively, we only need one update.

No need for a lock really...
- add the new node (with a temporary 'wrong' nulls value),
- delete the old node
- replace the nulls value by the expected one.