lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZeAzOUN-8QEl4U2n@ninjato>
Date: Thu, 29 Feb 2024 08:33:13 +0100
From: Wolfram Sang <wsa+renesas@...g-engineering.com>
To: Dirk Behme <dirk.behme@...bosch.com>
Cc: linux-renesas-soc@...r.kernel.org, Ulf Hansson <ulf.hansson@...aro.org>,
	linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFT] mmc: tmio: avoid concurrent runs of
 mmc_request_done()

Hi Dirk,

> > With the to-be-fixed commit, the reset_work handler cleared 'host->mrq'
> > outside of the spinlock protected critical section. That leaves a small
> > race window during execution of 'tmio_mmc_reset()' where the done_work
> > handler could grab a pointer to the now invalid 'host->mrq'. Both would
> > use it to call mmc_request_done() causing problems (see Link).
> > 
> > However, 'host->mrq' cannot simply be cleared earlier inside the
> > critical section. That would allow new mrqs to come in asynchronously
> > while the actual reset of the controller still needs to be done. So,
> > like 'tmio_mmc_set_ios()', an ERR_PTR is used to prevent new mrqs from
> > coming in but still avoiding concurrency between work handlers.
> > 
> > Reported-by: Dirk Behme <dirk.behme@...bosch.com>
> > Closes: https://lore.kernel.org/all/20240220061356.3001761-1-dirk.behme@de.bosch.com/
> > Signed-off-by: Wolfram Sang <wsa+renesas@...g-engineering.com>
> > Fixes: df3ef2d3c92c ("mmc: protect the tmio_mmc driver against a theoretical race")
> 
> Tested-by: Dirk Behme <dirk.behme@...bosch.com>
> Reviewed-by: Dirk Behme <dirk.behme@...bosch.com>

Awesome! Thanks for the super-fast tags!

> At least the issues we observed before are not seen any more. As we are not
> exactly sure on the root cause, of course this is not a 100% proof. But as
> the change looks good, looks like it won't break something and the system
> behaves good with it I would say we are good to go.

I agree. We don't know if it is all you need. But there definitely was a
race window and closing it removes some observed anomalies. Let's hope
all of them :) I looked many times at the code and, to the best of my
knowledge, don't see side effects. 'host->mrq' stays non-NULL, so new
mrqs won't be added like before. Changing it to an ERR_PTR will only
affect the check in the done_work handler which is what we want. But, of
course, more eyes are always welcome.

> I think we could add anything like
> 
> Cc: stable@...r.kernel.org # 3.0+

Yes, we should definitely have that. I would have added it once your
testing got good results. This affects every Renesas SDHI or Uniphier SD
instance since 3.0 (12 years). Wow! So, thanks a ton for your report and
assistance in debugging it. Very much appreciated! And, phew, I am happy
that this solution does not make the locking more complex \o/

All the best,

   Wolfram


Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ