[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoCLFZTsfUxLogZmmVsR3YAabpErYWVsGv3XzGNR9iuEbg@mail.gmail.com>
Date: Wed, 3 Sep 2025 17:05:45 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Xuanqiang Luo <xuanqiang.luo@...ux.dev>, kuniyu@...gle.com, davem@...emloft.net,
kuba@...nel.org, kernelxing@...cent.com, netdev@...r.kernel.org,
Xuanqiang Luo <luoxuanqiang@...inos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk
On Wed, Sep 3, 2025 at 4:35 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Sep 2, 2025 at 11:53 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 2:40 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Tue, Sep 2, 2025 at 7:46 PM Xuanqiang Luo <xuanqiang.luo@...ux.dev> wrote:
> > > >
> > > > From: Xuanqiang Luo <luoxuanqiang@...inos.cn>
> > > >
> > > > Since the lookup of sk in ehash is lockless, when one CPU is performing a
> > > > lookup while another CPU is executing delete and insert operations
> > > > (deleting reqsk and inserting sk), the lookup CPU may miss either of
> > > > them, if sk cannot be found, an RST may be sent.
> > > >
> > > > The call trace map is drawn as follows:
> > > > CPU 0 CPU 1
> > > > ----- -----
> > > > spin_lock()
> > > > sk_nulls_del_node_init_rcu(osk)
> > > > __inet_lookup_established()
> > > > __sk_nulls_add_node_rcu(sk, list)
> > > > spin_unlock()
> > > >
> > > > We can try using spin_lock()/spin_unlock() to wait for ehash updates
> > > > (ensuring all deletions and insertions are completed) after a failed
> > > > lookup in ehash, then lookup sk again after the update. Since the sk
> > > > expected to be found is unlikely to encounter the aforementioned scenario
> > > > multiple times consecutively, we only need one update.
> > >
> > > No need for a lock really...
> > > - add the new node (with a temporary 'wrong' nulls value),
> > > - delete the old node
> > > - replace the nulls value by the expected one.
> >
> > Yes. The plan is simple enough to fix this particular issue and I
> > verified in production long ago. Sadly the following patch got
> > reverted...
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3f4ca5fafc08881d7a57daa20449d171f2887043
>
> Please read again what I wrote, and compare it to your old patch.
>
> - add the new node (with a temporary 'wrong' nulls value),
> - delete the old node
> - replace the nulls value by the expected one.
>
> Can you see a difference ?
IIUC, you use a temporary value. It would avoid two sockets appearing
in the list at the same time.
Thanks,
Jason
Powered by blists - more mailing lists