[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CANn89i+JBhj+g564rfVd9gK7OH48v3N+Ln0vAgJehM5xJh32-g@mail.gmail.com>
Date: Thu, 8 Jun 2023 06:13:04 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: "Duan,Muquan" <duanmuquan@...du.com>
Cc: "davem@...emloft.net" <davem@...emloft.net>, "dsahern@...nel.org" <dsahern@...nel.org>,
"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] tcp: fix connection reset due to tw hashdance race.
On Thu, Jun 8, 2023 at 5:59 AM Duan,Muquan <duanmuquan@...du.com> wrote:
>
> Hi, Eric,
>
> Thanks a lot for your explanation!
>
> Even if we add reader lock, if set the refcnt outside spin_lock()/spin_unlock(), during the interval between spin_unlock() and refcnt_set(), other cpus will see the tw sock with refcont 0, and validation for refcnt will fail.
>
> A suggestion, before the tw sock is added into ehash table, it has been already used by tw timer and bhash chain, we can firstly add refcnt to 2 before adding two to ehash table,. or add the refcnt one by one for timer, bhash and ehash. This can avoid the refcont validation failure on other cpus.
>
> This can reduce the frequency of the connection reset issue from 20 min to 180 min for our product, We may wait quite a long time before the best solution is ready, if this obvious defect is fixed, userland applications can benefit from it.
>
> Looking forward to your opinions!
Again, my opinion is that we need a proper fix, not work arounds.
I will work on this a bit later.
In the meantime you can apply locally your patch if you feel this is
what you want.
Powered by blists - more mailing lists