[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240205103851.rnptahgykf3pgyss@dhruva>
Date: Mon, 5 Feb 2024 16:08:51 +0530
From: Dhruva Gole <d-gole@...com>
To: Miquel Raynal <miquel.raynal@...tlin.com>
CC: Théo Lebrun <theo.lebrun@...tlin.com>,
Mark Brown
<broonie@...nel.org>, Apurva Nandan <a-nandan@...com>,
<linux-spi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
Gregory CLEMENT
<gregory.clement@...tlin.com>,
Vladimir Kondratiev
<vladimir.kondratiev@...ileye.com>,
Thomas Petazzoni
<thomas.petazzoni@...tlin.com>,
Tawfik Bayouk <tawfik.bayouk@...ileye.com>
Subject: Re: [PATCH] spi: cadence-qspi: stop calling system-wide PM helpers
for runtime PM
Hello,
On Feb 05, 2024 at 11:12:54 +0100, Miquel Raynal wrote:
> Hi Théo,
>
> > > > The fatal conclusion of this is a deadlock: we acquire a lock on each
> > > > operation but while running the operation, we might want to runtime
> > > > resume and acquire the same lock.
> > > >
> > > > Anyway, those helpers (spi_controller_{suspend,resume}) are aimed at
> > > > system-wide suspend and resume and should NOT be called at runtime
> > > > suspend & resume.
> > > >
> > > > Side note: the previous implementation had a second issue. It acquired a
> > > > pointer to both `struct cqspi_st` and `struct spi_controller` using
> > > > dev_get_drvdata(). Neither embed the other. This lead to memory
Oops, I seem to have overlooked this. I think it should've been
spi_controller_get_devdata()
> > > > corruption that was being hidden inside the big cqspi->f_pdata array on
> > > > my setup. It was working until I tried changing the array side to its
> > > > theorical max of 4, which lead to the discovery of this gnarly bug.
> > > >
> > > > Fixes: 0578a6dbfe75 ("spi: spi-cadence-quadspi: add runtime pm support")
> > > > Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations")
Thanks for the fixes.
> > >
> > > Your commit log makes total sense but I believe the diff is gonna break
> > > again the suspend to RAM operation. This is only my understanding
> > > right after quickly going through the whole story, so maybe I'm
> > > totally off topic.
> >
> > The current ->runtime_suspend() implementation would indeed (probably)
> > work for suspend-to-RAM if it wasn't for the wrong pointers to cqspi
> > and spi_controller (see side note from commit message).
>
> Yeah, this probably needs to be fixed aside.
>
> > I've not found a moment where `struct cqspi_st` embed `struct
> > spi_controller` at its start, so I do not believe this has ever worked.
I don't know how it worked either, but I had definitely tested and
provided logs at the time of posting the series,
https://lore.kernel.org/all/20230417091027.966146-1-d-gole@ti.com/
> > It might be the result of a mistake while porting a patch from a branch
> > that included other changes.
Hmm, could be, not entirely sure now. But I did test it and now don't
know how it had worked with that wrong pointer now that I see that
mistake.
> >
> > > What happened if I understand the two commits blamed above:
> > >
> > > - There were PM hooks.
> > > - Someone turned them into runtime PM hooks (breaking regular
> > > suspend/resume).
> > > - Someone else added the "missing" suspend/resume logic inside the
> > > runtime PM hooks to fix suspend and resume.
> > > - You are removing this logic because it leads to deadlocks.
> > >
> > > There was likely a misconception of what is expected in both cases
> > > (quick and small power savings vs. full power cycle/loosing the whole
> > > configuration).
The context was as follows,
The upstream cqspi driver prior to this:
https://lore.kernel.org/all/20230417091027.966146-1-d-gole@ti.com/
series had buggy suspend resume. That needed fixing hence I added the
first patch that introduced the buggy pointer but somehow still ended up
working after suspend resume.
After that, I also wanted the driver to support runtime_pm. I thought
that both system suspend and runtime pm would have similar requirements
from a driver POV since the IP essentially would turn off and from it's
view would need system suspend like suspend resume calls.
> > >
> > > I would propose instead to create two distinct set of functions:
> > > - One for runtime PM
> > > - One for suspend/resume
> > > This way the runtime PM no longer deadlocks and people using
> > > suspend/resume won't get affected? I don't know if your runtime hooks
> > > *will* always be called during a suspend/resume. I hope so, which would
> > > make the split quite easy and without any code duplication.
> >
> > That does indeed sound like the right approach. Runtime hooks can be
> > called from suspend/resume if needs be. Runtime PM then gets disabled
> > at the late stage.
>
> Would make sense indeed.
Now that I look at it, perhaps it is best to have 2 seperate calls for
runtime and system pm.
>
> > I do not believe currently system-wide suspend can be working.
> > spi_controller_{suspend,resume} are being called with a bogus pointer.
> > This makes me ask: should the system-wide suspend/resume part be
> > addressed with this patch or a follow-up? It feels like a separate
> > concern to me.
>
> Probably two patches, yes.
Yes, I think it best that we add a proper system suspend and runtime pm
support for this driver.
Again, thanks for catching this bug and reporting a fix. I also have an
SK-AM62 handy which uses this ospi controller so let me see if I can
help test your patches with system and runtime pm as well whenever you
do post them.
--
Best regards,
Dhruva Gole <d-gole@...com>
Powered by blists - more mailing lists