[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZeAzOUN-8QEl4U2n@ninjato>
Date: Thu, 29 Feb 2024 08:33:13 +0100
From: Wolfram Sang <wsa+renesas@...g-engineering.com>
To: Dirk Behme <dirk.behme@...bosch.com>
Cc: linux-renesas-soc@...r.kernel.org, Ulf Hansson <ulf.hansson@...aro.org>,
linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFT] mmc: tmio: avoid concurrent runs of
mmc_request_done()
Hi Dirk,
> > With the to-be-fixed commit, the reset_work handler cleared 'host->mrq'
> > outside of the spinlock protected critical section. That leaves a small
> > race window during execution of 'tmio_mmc_reset()' where the done_work
> > handler could grab a pointer to the now invalid 'host->mrq'. Both would
> > use it to call mmc_request_done() causing problems (see Link).
> >
> > However, 'host->mrq' cannot simply be cleared earlier inside the
> > critical section. That would allow new mrqs to come in asynchronously
> > while the actual reset of the controller still needs to be done. So,
> > like 'tmio_mmc_set_ios()', an ERR_PTR is used to prevent new mrqs from
> > coming in but still avoiding concurrency between work handlers.
> >
> > Reported-by: Dirk Behme <dirk.behme@...bosch.com>
> > Closes: https://lore.kernel.org/all/20240220061356.3001761-1-dirk.behme@de.bosch.com/
> > Signed-off-by: Wolfram Sang <wsa+renesas@...g-engineering.com>
> > Fixes: df3ef2d3c92c ("mmc: protect the tmio_mmc driver against a theoretical race")
>
> Tested-by: Dirk Behme <dirk.behme@...bosch.com>
> Reviewed-by: Dirk Behme <dirk.behme@...bosch.com>
Awesome! Thanks for the super-fast tags!
> At least the issues we observed before are not seen any more. As we are not
> exactly sure on the root cause, of course this is not a 100% proof. But as
> the change looks good, looks like it won't break something and the system
> behaves good with it I would say we are good to go.
I agree. We don't know if it is all you need. But there definitely was a
race window and closing it removes some observed anomalies. Let's hope
all of them :) I looked many times at the code and, to the best of my
knowledge, don't see side effects. 'host->mrq' stays non-NULL, so new
mrqs won't be added like before. Changing it to an ERR_PTR will only
affect the check in the done_work handler which is what we want. But, of
course, more eyes are always welcome.
> I think we could add anything like
>
> Cc: stable@...r.kernel.org # 3.0+
Yes, we should definitely have that. I would have added it once your
testing got good results. This affects every Renesas SDHI or Uniphier SD
instance since 3.0 (12 years). Wow! So, thanks a ton for your report and
assistance in debugging it. Very much appreciated! And, phew, I am happy
that this solution does not make the locking more complex \o/
All the best,
Wolfram
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists