lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40e2170b52f1f80fd72405df6901c4323f903e67.camel@nvidia.com>
Date: Thu, 20 Feb 2025 10:48:43 +0000
From: Cosmin Ratiu <cratiu@...dia.com>
To: "liuhangbin@...il.com" <liuhangbin@...il.com>
CC: "shuah@...nel.org" <shuah@...nel.org>, "andrew+netdev@...n.ch"
	<andrew+netdev@...n.ch>, "davem@...emloft.net" <davem@...emloft.net>,
	"jv@...sburgh.net" <jv@...sburgh.net>, "herbert@...dor.apana.org.au"
	<herbert@...dor.apana.org.au>, "andy@...yhouse.net" <andy@...yhouse.net>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "sd@...asysnail.net" <sd@...asysnail.net>, Jianbo Liu
	<jianbol@...dia.com>, "horms@...nel.org" <horms@...nel.org>,
	"kuba@...nel.org" <kuba@...nel.org>, Tariq Toukan <tariqt@...dia.com>,
	"razor@...ckwall.org" <razor@...ckwall.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "steffen.klassert@...unet.com"
	<steffen.klassert@...unet.com>, "linux-kselftest@...r.kernel.org"
	<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH net 0/2] bond: fix xfrm offload feature during init

On Mon, 2025-01-20 at 23:59 +0000, Hangbin Liu wrote:
> > 
> > I am not sure this can be fixed in bonding given that the
> > xdo_dev_state_delete op could, in the general case, sleep while
> > talking
> > to the hardware. I don't think it's reasonable to expect devices to
> > offload xfrm while the kernel holds a spinlock.
> > Bonding just exposed this assumption mismatch because of the mutex
> > that
> > was added to replace a spinlock which exhibited the same problem we
> > are
> > talking about here.
> > 
> > Do the dev offload operations need to be synchronous? Couldn't
> > __xfrm_state_delete instead schedule a wq to do the dev offload? I
> > saw
> > there's already an xfrm_state_gc_task that's invoked to call
> > xfrm_dev_state_free, perhaps that could be used to do the delete as
> > well?
> 
> Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa()
> to a wq
> in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move
> the
> following part out of spin lock via wq.
> 
>         mutex_lock(&bond->ipsec_lock);
>         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>                 if (ipsec->xs == xs) {
>                         list_del(&ipsec->list);
>                         kfree(ipsec);
>                         break;
>                 }
>         }
>         mutex_unlock(&bond->ipsec_lock);
> 
> But we can see there is an (ipsec->xs == xs). So we still need to
> make
> sure the xs is not released. Can we add a xs reference in
> bond_ipsec_del_sa()
> to achieve this?

Hello,

After staring at the issue a while longer, I am also converging on just
moving that mutex part from bond_ipsec_del_sa out to a wq. I browsed
through all driver implementations of .xdo_dev_state_delete() and found
none that sleeps or allocates memory with GFP_KERNEL. So if we only fix
bond_ipsec_del_sa, that would be enough to make it all work again.

So it should be perfectly safe to add a ref to xs in bond_ipsec_del_sa
before firing up a wq to do the mutex lock + list traversal, before
releasing the ref.
xfrm_state is already unlinked from everything by __xfrm_state_delete
before xfrm_dev_state_delete is called and the xfrm_state_alloc
reference is dropped by the end of xfrm_dev_state_delete, so the only
thing keeping it alive would be the reference added in
bond_ipsec_del_sa. When that is put after the list traversal,
__xfrm_state_destroy gets called with sync == false, which passes on
the baton to another wq to do the gc for xs.

This all sounds reasonable.
Will you chase this down or do you prefer me to send the proposed fix?

Cosmin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ