[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z7cPdyDZ9OOIgU7c@fedora>
Date: Thu, 20 Feb 2025 11:18:15 +0000
From: Hangbin Liu <liuhangbin@...il.com>
To: Cosmin Ratiu <cratiu@...dia.com>
Cc: "shuah@...nel.org" <shuah@...nel.org>,
"andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
"davem@...emloft.net" <davem@...emloft.net>,
"jv@...sburgh.net" <jv@...sburgh.net>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"andy@...yhouse.net" <andy@...yhouse.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"sd@...asysnail.net" <sd@...asysnail.net>,
Jianbo Liu <jianbol@...dia.com>,
"horms@...nel.org" <horms@...nel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
Tariq Toukan <tariqt@...dia.com>,
"razor@...ckwall.org" <razor@...ckwall.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"steffen.klassert@...unet.com" <steffen.klassert@...unet.com>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
On Thu, Feb 20, 2025 at 10:48:43AM +0000, Cosmin Ratiu wrote:
> On Mon, 2025-01-20 at 23:59 +0000, Hangbin Liu wrote:
> > >
> > > I am not sure this can be fixed in bonding given that the
> > > xdo_dev_state_delete op could, in the general case, sleep while
> > > talking
> > > to the hardware. I don't think it's reasonable to expect devices to
> > > offload xfrm while the kernel holds a spinlock.
> > > Bonding just exposed this assumption mismatch because of the mutex
> > > that
> > > was added to replace a spinlock which exhibited the same problem we
> > > are
> > > talking about here.
> > >
> > > Do the dev offload operations need to be synchronous? Couldn't
> > > __xfrm_state_delete instead schedule a wq to do the dev offload? I
> > > saw
> > > there's already an xfrm_state_gc_task that's invoked to call
> > > xfrm_dev_state_free, perhaps that could be used to do the delete as
> > > well?
> >
> > Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa()
> > to a wq
> > in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move
> > the
> > following part out of spin lock via wq.
> >
> > mutex_lock(&bond->ipsec_lock);
> > list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > if (ipsec->xs == xs) {
> > list_del(&ipsec->list);
> > kfree(ipsec);
> > break;
> > }
> > }
> > mutex_unlock(&bond->ipsec_lock);
> >
> > But we can see there is an (ipsec->xs == xs). So we still need to
> > make
> > sure the xs is not released. Can we add a xs reference in
> > bond_ipsec_del_sa()
> > to achieve this?
>
> Hello,
>
> After staring at the issue a while longer, I am also converging on just
> moving that mutex part from bond_ipsec_del_sa out to a wq. I browsed
> through all driver implementations of .xdo_dev_state_delete() and found
> none that sleeps or allocates memory with GFP_KERNEL. So if we only fix
> bond_ipsec_del_sa, that would be enough to make it all work again.
>
> So it should be perfectly safe to add a ref to xs in bond_ipsec_del_sa
> before firing up a wq to do the mutex lock + list traversal, before
> releasing the ref.
> xfrm_state is already unlinked from everything by __xfrm_state_delete
> before xfrm_dev_state_delete is called and the xfrm_state_alloc
> reference is dropped by the end of xfrm_dev_state_delete, so the only
> thing keeping it alive would be the reference added in
> bond_ipsec_del_sa. When that is put after the list traversal,
> __xfrm_state_destroy gets called with sync == false, which passes on
> the baton to another wq to do the gc for xs.
>
> This all sounds reasonable.
> Will you chase this down or do you prefer me to send the proposed fix?
Thanks for the feedback and confirmation. Let me try it first. Hope
unregistering bond doesn't affect the gc works.
Thanks
Hangbin
Powered by blists - more mailing lists