[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40e2170b52f1f80fd72405df6901c4323f903e67.camel@nvidia.com>
Date: Thu, 20 Feb 2025 10:48:43 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "liuhangbin@...il.com" <liuhangbin@...il.com>
CC: "shuah@...nel.org" <shuah@...nel.org>, "andrew+netdev@...n.ch"
<andrew+netdev@...n.ch>, "davem@...emloft.net" <davem@...emloft.net>,
"jv@...sburgh.net" <jv@...sburgh.net>, "herbert@...dor.apana.org.au"
<herbert@...dor.apana.org.au>, "andy@...yhouse.net" <andy@...yhouse.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>, "edumazet@...gle.com"
<edumazet@...gle.com>, "sd@...asysnail.net" <sd@...asysnail.net>, Jianbo Liu
<jianbol@...dia.com>, "horms@...nel.org" <horms@...nel.org>,
"kuba@...nel.org" <kuba@...nel.org>, Tariq Toukan <tariqt@...dia.com>,
"razor@...ckwall.org" <razor@...ckwall.org>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "steffen.klassert@...unet.com"
<steffen.klassert@...unet.com>, "linux-kselftest@...r.kernel.org"
<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
On Mon, 2025-01-20 at 23:59 +0000, Hangbin Liu wrote:
> >
> > I am not sure this can be fixed in bonding given that the
> > xdo_dev_state_delete op could, in the general case, sleep while
> > talking
> > to the hardware. I don't think it's reasonable to expect devices to
> > offload xfrm while the kernel holds a spinlock.
> > Bonding just exposed this assumption mismatch because of the mutex
> > that
> > was added to replace a spinlock which exhibited the same problem we
> > are
> > talking about here.
> >
> > Do the dev offload operations need to be synchronous? Couldn't
> > __xfrm_state_delete instead schedule a wq to do the dev offload? I
> > saw
> > there's already an xfrm_state_gc_task that's invoked to call
> > xfrm_dev_state_free, perhaps that could be used to do the delete as
> > well?
>
> Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa()
> to a wq
> in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move
> the
> following part out of spin lock via wq.
>
> mutex_lock(&bond->ipsec_lock);
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> if (ipsec->xs == xs) {
> list_del(&ipsec->list);
> kfree(ipsec);
> break;
> }
> }
> mutex_unlock(&bond->ipsec_lock);
>
> But we can see there is an (ipsec->xs == xs). So we still need to
> make
> sure the xs is not released. Can we add a xs reference in
> bond_ipsec_del_sa()
> to achieve this?
Hello,
After staring at the issue a while longer, I am also converging on just
moving that mutex part from bond_ipsec_del_sa out to a wq. I browsed
through all driver implementations of .xdo_dev_state_delete() and found
none that sleeps or allocates memory with GFP_KERNEL. So if we only fix
bond_ipsec_del_sa, that would be enough to make it all work again.
So it should be perfectly safe to add a ref to xs in bond_ipsec_del_sa
before firing up a wq to do the mutex lock + list traversal, before
releasing the ref.
xfrm_state is already unlinked from everything by __xfrm_state_delete
before xfrm_dev_state_delete is called and the xfrm_state_alloc
reference is dropped by the end of xfrm_dev_state_delete, so the only
thing keeping it alive would be the reference added in
bond_ipsec_del_sa. When that is put after the list traversal,
__xfrm_state_destroy gets called with sync == false, which passes on
the baton to another wq to do the gc for xs.
This all sounds reasonable.
Will you chase this down or do you prefer me to send the proposed fix?
Cosmin.
Powered by blists - more mailing lists