lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACMGZgZY4Zb+3vHUDAS0+3r55K4_J40dtbsTPTFZMd6duBikpQ@mail.gmail.com>
Date:   Mon, 9 Jan 2023 12:52:08 +0100
From:   Peter Suti <peter.suti@...eamunlimited.com>
To:     Heiner Kallweit <hkallweit1@...il.com>
Cc:     Ulf Hansson <ulf.hansson@...aro.org>,
        Neil Armstrong <neil.armstrong@...aro.org>,
        Kevin Hilman <khilman@...libre.com>,
        Jerome Brunet <jbrunet@...libre.com>,
        Martin Blumenstingl <martin.blumenstingl@...glemail.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        linux-mmc@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-amlogic@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH v3] mmc: meson-gx: fix SDIO interrupt handling

On Wed, Dec 14, 2022 at 10:33 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
>
> On 14.12.2022 14:46, Peter Suti wrote:
> > With the interrupt support introduced in commit 066ecde sometimes the
> > Marvell-8987 wifi chip got stuck using the marvell-sd-uapsta-8987
> > vendor driver. The cause seems to be that after sending ack to all interrupts
> > the IRQ_SDIO still happens, but it is ignored.
> >
> > To work around this, recheck the IRQ_SDIO after meson_mmc_request_done().
> >
> > Inspired by 9e2582e ("mmc: mediatek: fix SDIO irq issue") which used a
> > similar fix to handle lost interrupts.
> >
> The commit description of the referenced fix isn't clear with regard to
> who's fault it is that an interrupt can be lost. I'd interpret it being
> a silicon bug rather than a kernel/driver bug.
Unfortunately I can't confirm that the referenced bug is in the
silicon for the original commit either.
However a similar workaround works in this case too which is why I
referenced that commit.

> Not sure whether it's the case, but it's possible that both vendors use
> at least parts of the same IP in the MMC block, and therefore the issue
> pops up here too.
>
> > Fixes: 066ecde ("mmc: meson-gx: add SDIO interrupt support")
> >
> > Signed-off-by: Peter Suti <peter.suti@...eamunlimited.com>
> > ---
> > Changes in v2:
> >       - use spin_lock instead of spin_lock_irqsave
> >       - only reenable interrupts if they were enabled already
> >
> > Changes in v3:
> >       - Rework the patch based on feedback from Heiner Kallweit.
> >               The IRQ does not happen on 2 CPUs and the hard IRQ is not re-entrant.
> >               But still one SDIO IRQ is lost without this change.
> >               After the ack, reading the SD_EMMC_STATUS BIT(15) is set, but
> >               meson_mmc_irq() is never called again.
> >
> >               The fix is similar to Mediatek msdc_recheck_sdio_irq().
> >               That platform also loses an IRQ in some cases it seems.
> >
> >  drivers/mmc/host/meson-gx-mmc.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c
> > index 6e5ea0213b47..7d3ee2f9a7f6 100644
> > --- a/drivers/mmc/host/meson-gx-mmc.c
> > +++ b/drivers/mmc/host/meson-gx-mmc.c
> > @@ -1023,6 +1023,22 @@ static irqreturn_t meson_mmc_irq(int irq, void *dev_id)
> >       if (ret == IRQ_HANDLED)
> >               meson_mmc_request_done(host->mmc, cmd->mrq);
> >
> > +     /*
> > +     * Sometimes after we ack all raised interrupts,
> > +     * an IRQ_SDIO can still be pending, which can get lost.
> > +     *
>
> A reader may scratch his head here and wonder how the interrupt can get lost,
> and why adding a workaround instead of eliminating the root cause for losing
> the interrupt. If you can't provide an explanation why the root cause for
> losing the interrupt can't be fixed, presumably you would have to say that
> you're adding a workaround for a suspected silicon bug.
After talking to the manufacturer, we got the following explanation
for this situation:
"wifi may have dat1 interrupt coming in, without this the dat1
interrupt would be missed"
Supposedly this is fixed in their codebase.
Unfortunately we were not able to find out more and can't prepare a
patch with a proper explanation.
Thank you for reviewing.
>
> > +     * To prevent this, recheck the IRQ_SDIO here and schedule
> > +     * it to be processed.
> > +     */
> > +     raw_status = readl(host->regs + SD_EMMC_STATUS);
> > +     status = raw_status & (IRQ_EN_MASK | IRQ_SDIO);
>
> This isn't needed here. Why not simply:
>
> status = readl(host->regs + SD_EMMC_STATUS);
> if (status & IRQ_SDIO)
>   ...
>
>
> > +     if (status & IRQ_SDIO) {
> > +             spin_lock(&host->lock);
> > +             __meson_mmc_enable_sdio_irq(host->mmc, 0);
> > +             sdio_signal_irq(host->mmc);
> > +             spin_unlock(&host->lock);
> > +     }
> > +
> >       return ret;
> >  }
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ