lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 17 Nov 2023 15:45:05 +0100
From: Uwe Kleine-König <u.kleine-koenig@...gutronix.de>
To: Alex Elder <elder@...e.org>
Cc: Alex Elder <elder@...nel.org>, "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	netdev@...r.kernel.org, kernel@...gutronix.de
Subject: Re: [PATCH net-next 01/10] net: ipa: Don't error out in .remove()

Hello Alex,

On Fri, Nov 17, 2023 at 08:16:02AM -0600, Alex Elder wrote:
> On 11/17/23 3:59 AM, Uwe Kleine-König wrote:
> > Returning early from .remove() with an error code still results in the
> > driver unbinding the device. So the driver core ignores the returned error
> > code and the resources that were not freed are never catched up. In
> > combination with devm this also often results in use-after-free bugs.
> > 
> > Here even if the modem cannot be stopped, resources must be freed. So
> > replace the early error return by an error message an continue to clean up.
> > 
> > This prepares changing ipa_remove() to return void.
> > 
> > Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code")
> 
> Is this really a bug fix?  This code was doing the right
> thing even if the caller was not.

Yes, since cdf2e9419dd9 the driver is leaking resources if
ipa_modem_stop() fails. I call that a bug.

> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@...gutronix.de>
> > ---
> >   drivers/net/ipa/ipa_main.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
> > index da853353a5c7..60e4f590f5de 100644
> > --- a/drivers/net/ipa/ipa_main.c
> > +++ b/drivers/net/ipa/ipa_main.c
> > @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev)
> >   			ret = ipa_modem_stop(ipa);
> >   		}
> >   		if (ret)
> > -			return ret;
> > +			dev_err(dev, "Failed to stop modem (%pe)\n",
> > +				ERR_PTR(ret));
> 
> I think this is not correct, or rather, I think it is less
> correct than returning early.
> 
> What's happening here is we're trying to stop the modem.
> It is an external entity that might have some in-flight
> activity that could include "owning" some buffers provided
> by Linux, to be filled with received data.  There's a
> chance that cleaning up (with the call to ipa_teardown())
> can do the right thing, but I'm not going to sign off on
> this until I've looked at that in closer detail.
> 
> This is something that *could* happen but is not *expected*
> to happen.  We expect stopping the modem to succeed so if
> it doesn't, something's wrong and it's not 100% clear how
> to properly handle it.

Returning early is wrong for sure. You skip for example the free_irq
step, so the irq stays active, it might trigger and use the unbound
device. And as the device is unbound, .remove() is never retried and the
irq (and all the other resources) are never freed.

Take the time you need to review the changes. If you don't want to
accept the change now, I'd like to apply the following change:

diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index da853353a5c7..f77decf0b1e4 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -959,8 +959,11 @@ static int ipa_remove(struct platform_device *pdev)
 			usleep_range(USEC_PER_MSEC, 2 * USEC_PER_MSEC);
 			ret = ipa_modem_stop(ipa);
 		}
-		if (ret)
-			return ret;
+		if (ret) {
+			dev_err(dev, "Failed to stop modem (%pe)\n",
+				ERR_PTR(ret));
+			return 0;
+		}
 
 		ipa_teardown(ipa);
 	}

instead. This introduces no change in behaviour apart from improving the
error message and allows me to continue with my quest to make .remove
return void.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ