lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 28 May 2019 15:49:28 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Ulf Hansson <ulf.hansson@...aro.org>
Cc:     Jaehoon Chung <jh80.chung@...sung.com>,
        Shawn Lin <shawn.lin@...k-chips.com>,
        Kalle Valo <kvalo@...eaurora.org>,
        Heiko Stuebner <heiko@...ech.de>,
        "open list:ARM/Rockchip SoC..." <linux-rockchip@...ts.infradead.org>,
        Guenter Roeck <groeck@...omium.org>,
        Brian Norris <briannorris@...omium.org>,
        linux-wireless <linux-wireless@...r.kernel.org>,
        Sonny Rao <sonnyrao@...omium.org>,
        Emil Renner Berthing <emil.renner.berthing@...il.com>,
        Matthias Kaehlcke <mka@...omium.org>,
        Ryan Case <ryandcase@...omium.org>,
        "# 4.0+" <stable@...r.kernel.org>,
        "linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] mmc: dw_mmc: Disable SDIO interrupts while suspended
 to fix suspend/resume

Hi,

On Tue, May 28, 2019 at 12:22 PM Ulf Hansson <ulf.hansson@...aro.org> wrote:
>
> On Mon, 29 Apr 2019 at 22:41, Douglas Anderson <dianders@...omium.org> wrote:
> >
> > Processing SDIO interrupts while dw_mmc is suspended (or partly
> > suspended) seems like a bad idea.  We really don't want to be
> > processing them until we've gotten ourselves fully powered up.
>
> I fully agree.
>
> Although, this is important not only from the host driver/controller
> perspective, but also from the SDIO card (managed by mmc core) point
> of view.
>
> In $subject patch you mange the driver/controller issue, but only for
> one specific host driver (dw_mmc). I am thinking that this problem may
> be a rather common problem, so perhaps we should try to address this
> from the core in a way that it affects all host drivers. Did you
> consider that?

I did wonder that.  See below, but in general I don't have massive
experience with all the host controllers out there.  Looking at sdhci
code, though, it might not have the same problems?  At least in some
cases it fully turns off the interrupts.  In other cases it seems like
the controller itself keeps power and so maybe getting the SDIO
interrupts early is OK?


> The other problem I refer to, is in principle a way to prevent
> sdio_run_irqs() from being executed before the SDIO card has been
> resumed, via mmc_sdio_resume(). It's a separate problem, but certainly
> related. This may need some more thinking to address properly, let's
> just keep this in mind and discuss this in a separate thread.

Actually, I think if we could figure out how to do this well it might
solve my particular problem.  Specifically I don't believe that
running dw_mmc's interrupt handler itself is causing the problem
(though it does seem pretty odd to run it while we're in the middle of
initting the host), I think it's the SDIO driver calling back into us
that's causing the problems.


> > You might be wondering how it's even possible to become suspended when
> > an SDIO interrupt is active.  As can be seen in
> > dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> > suspend when the SDIO interrupt is enabled.  ...but even though we
> > stop normal runtime suspend transitions when SDIO interrupts are
> > enabled, the dw_mci_runtime_suspend() can still get called for a full
> > system suspend.
> >
> > Let's handle all this by explicitly masking SDIO interrupts in the
> > suspend call and unmasking them later in the resume call.  To do this
> > cleanly I'll keep track of whether the client requested that SDIO
> > interrupts be enabled so that we can reliably restore them regardless
> > of whether we're masking them for one reason or another.
> >
> > It should be noted that if dw_mci_enable_sdio_irq() is never called
> > (for instance, if we don't have an SDIO card plugged in) that
> > "client_sdio_enb" will always be false.  In those cases this patch
> > adds a tiny bit of overhead to suspend/resume (a spinlock and a
> > read/write of INTMASK) but other than that is a no-op.  The
> > SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> > hurt.
>
> Thanks for the detailed problem description. In general your approach
> sounds okay to me, but I have a few questions.
>
> 1) As kind of stated above, did you consider a solution where the core
> simply disables the SDIO IRQ in case it isn't enabled for system
> wakeup? In this way all host drivers would benefit.

I can give it a shot if you can give me a bunch of specific advice,
but I only have access to a few devices doing anything with SDIO and
they are all using Marvell or Broadcom on dw_mmc.

In general I have no idea how SDIO wakeup (plumbed through the SD
controller) would work.  As per below the only way I've seen it done
is totally out-of-band.  ...and actually, I'm not sure I've actually
ever seen even the out of band stuff truly work on a system myself.
It's always been one of those "we should support wake on WiFi" but
never made it all the way to implementation.  In any case, if there
are examples of people plumbing wakeup through the SD controller I'd
need to figure out how to not break them.  Just doing a solution for
dw_mmc means I don't have to worry about this since dw_mmc definitely
doesn't support SDIO wakeup.

Maybe one way to get a more generic solution is if you had an idea for
a patch that would work for many host controllers then you could post
it and I could test to confirm that it's happy on dw_mmc?  ...similar
to when you switched dw_mmc away from the old kthread-based SDIO
stuff?


> 2) dw_mmc isn't calling device_init_wakeup() during ->probe(), hence I
> assume it doesn't support the SDIO IRQ being configured as system
> wakeup. Correct? I understand this is platform specific, but still it
> would be good to know your view.

Right, currently dw_mmc doesn't support the SDIO IRQ being configured
as a system wakeup.  I'm kinda curious how this works in general.
Don't you need a clock running to get an SDIO interrupt?  How does
that work for suspend?  ...I do know that I've seen some dw_mmc host
controllers have an "SDIO IRQ" line that could be pinned out and that
line would also assert the same SDIO interrupt, but the mainline
driver has never supported it.  Whenever I asked Marvell about it in
the past they were always confused about what to do with that line and
(if I remember correctly) we never hooked it up.  I always though it
would be super interesting because it seemed like it would let us
disable the card clock when the slot was idle.  ...as far as I was
ever able to discern the pin was officially "non-standard".

As far as I know SDIO cards that want to be able to wakeup the device
end up using some sort of out of band mechanism.  For instance
"marvell,wakeup-pin" or Broadcom's "host-wake" interrupt.  As per
above, I don't have tons of experience here.  I see that
"rk3399-gru-kevin" has "marvell,wakeup-pin" defined, but that's PCIe
not SDIO.  Downstream in our Chrome OS kernel it seems like
"mt8173-oak.dtsi" and "mt8176-rowan.dtsi" have this for SDIO but they
are are devices I didn't work on and don't have much familiarity with.


> 3) Because of 2) The below code in dw_mci_runtime_suspend(), puzzles me:
> "if (host->slot->mmc->pm_flags & MMC_PM_KEEP_POWER)"
>        dw_mci_set_ios(host->slot->mmc, &host->slot->mmc->ios);
>
> Why is 3) needed at all in case system wakeup isn't supported?

That code has been in there for a really long time, dating back to
commit ab269128a2cf ("mmc: dw_mmc: Add sdio power bindings").  ...but,
in general, I know that we always keep power to the SDIO card in
suspend time.  This doesn't take a whole lot of power and speeds up
WiFi acquisition after resume by a whole lot (because otherwise we'd
have to fully re-load the WiFi firmware).  In fact, I believe that the
Marvell WiFi driver requires it.  Ah yes, search for
"MMC_PM_KEEP_POWER" in "marvell/mwifiex/sdio.c" and you'll see that it
gets yells if power isn't kept.

If we are keeping power during suspend/resume then presumably the card
will expect that communication resumes as normal upon resume.  AKA:
the clock should come up at the expected rate / voltage level and we
should resume as normal.


> A note; the current support in the mmc core for the SDIO IRQ being
> used as system wakeup, really needs some re-work. For example, we
> should convert to use common wakeup interfaces, as to allow the PM
> core to behave correctly during system suspend/resume. These are
> changes that have been scheduled on my TODO list since long time ago,
> I hope I can get some time to look into them soon.

Personally my knowledge of SD / SDIO is mostly acquired by trying to
make the dw_mmc driver do what we've needed it to over the years, so I
don't actually have a ton of broad understanding of the SDIO spec and
what is generally done for host controllers / SDIO cards.  If you're
aware of standard ways to get an SDIO IRQ to work in suspend time then
that'd be interesting.


> > Without this fix it can be seen that rk3288-veyron Chromebooks with
> > Marvell WiFi would sometimes fail to resume WiFi even after picking my
> > recent mwifiex patch [1].  Specifically you'd see messages like this:
> >   mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
> >   mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
> >
> > ...and tracing through the resume code in the failing cases showed
> > that we were processing a SDIO interrupt really early in the resume
> > call.
> >
> > NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> > support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> > Defer SDIO interrupt handling until after resume") [2].  Presumably
> > this is the same problem that was solved by that patch.
>
> Seems reasonable.

Anyway, let me know what you think the next set of steps ought to be.
It would be sorta nice to get suspend/resume working reliably and land
this patch, but if you think that will impede our ability to come up
with a more generic solution then I guess we don't need to land it...


-Doug

Powered by blists - more mailing lists