[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9b85a06113e2c9a7a91f3486efc06edbce4e461.camel@nvidia.com>
Date: Tue, 11 Mar 2025 21:08:49 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "kuba@...nel.org" <kuba@...nel.org>, "razor@...ckwall.org"
<razor@...ckwall.org>, "liuhangbin@...il.com" <liuhangbin@...il.com>
CC: "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>, "jarod@...hat.com"
<jarod@...hat.com>, "davem@...emloft.net" <davem@...emloft.net>, Tariq Toukan
<tariqt@...dia.com>, Petr Machata <petrm@...dia.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"shuah@...nel.org" <shuah@...nel.org>, "steffen.klassert@...unet.com"
<steffen.klassert@...unet.com>, "jv@...sburgh.net" <jv@...sburgh.net>,
"pabeni@...hat.com" <pabeni@...hat.com>, "horms@...nel.org"
<horms@...nel.org>, "edumazet@...gle.com" <edumazet@...gle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>, Jianbo
Liu <jianbol@...dia.com>
Subject: Re: [PATCHv5 net 1/3] bonding: fix calling sleeping function in spin
lock and some race conditions
On Fri, 2025-03-07 at 09:03 -0800, Jakub Kicinski wrote:
> On Fri, 7 Mar 2025 09:42:49 +0200 Nikolay Aleksandrov wrote:
> > TBH, keeping buggy code with a comment doesn't sound good to me.
> > I'd rather remove this
> > support than tell people "good luck, it might crash". It's better
> > to be safe until a
> > correct design is in place which takes care of these issues.
>
> That's my feeling too, FWIW. I think we knew about this issue
> for a while now, the longer we wait the more users we may disrupt
> with the revert.
These are preexisting races between the bond link failover and the user
removing the xfrm states. Unless the user wants to intentionally
trigger these bugs, chances are nobody has ever encountered them in the
wild in normal operation. In steady state, bond link failover works,
and adding/removing states works. It's the combination of the two
control plane events that may have a chance to double free or leak
states.
I would not pull everything out just yet.
Today, I managed to find a solution for these races (I think), based on
a patch series I am preparing against ipsec-next with other changes
related to real_dev.
Hangbin, do you mind if I take over fixing the locking issue as part of
my series? I plan to send it upstream the following days.
Cosmin.
Powered by blists - more mailing lists