linux-kernel - Re: [PATCH v3] mmc: meson-gx: fix SDIO interrupt handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c4aa0d2-d8e9-416b-b2ad-f5c3c8ea33de@gmail.com>
Date:   Thu, 12 Jan 2023 22:27:44 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Peter Suti <peter.suti@...eamunlimited.com>
Cc:     Ulf Hansson <ulf.hansson@...aro.org>,
        Neil Armstrong <neil.armstrong@...aro.org>,
        Kevin Hilman <khilman@...libre.com>,
        Jerome Brunet <jbrunet@...libre.com>,
        Martin Blumenstingl <martin.blumenstingl@...glemail.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        linux-mmc@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-amlogic@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH v3] mmc: meson-gx: fix SDIO interrupt handling

On 09.01.2023 12:52, Peter Suti wrote:
> On Wed, Dec 14, 2022 at 10:33 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
>>
>> On 14.12.2022 14:46, Peter Suti wrote:
>>> With the interrupt support introduced in commit 066ecde sometimes the
>>> Marvell-8987 wifi chip got stuck using the marvell-sd-uapsta-8987
>>> vendor driver. The cause seems to be that after sending ack to all interrupts
>>> the IRQ_SDIO still happens, but it is ignored.
>>>
>>> To work around this, recheck the IRQ_SDIO after meson_mmc_request_done().
>>>
>>> Inspired by 9e2582e ("mmc: mediatek: fix SDIO irq issue") which used a
>>> similar fix to handle lost interrupts.
>>>
>> The commit description of the referenced fix isn't clear with regard to
>> who's fault it is that an interrupt can be lost. I'd interpret it being
>> a silicon bug rather than a kernel/driver bug.
> Unfortunately I can't confirm that the referenced bug is in the
> silicon for the original commit either.
> However a similar workaround works in this case too which is why I
> referenced that commit.
> 
>> Not sure whether it's the case, but it's possible that both vendors use
>> at least parts of the same IP in the MMC block, and therefore the issue
>> pops up here too.
>>
>>> Fixes: 066ecde ("mmc: meson-gx: add SDIO interrupt support")
>>>
>>> Signed-off-by: Peter Suti <peter.suti@...eamunlimited.com>
>>> ---
>>> Changes in v2:
>>>       - use spin_lock instead of spin_lock_irqsave
>>>       - only reenable interrupts if they were enabled already
>>>
>>> Changes in v3:
>>>       - Rework the patch based on feedback from Heiner Kallweit.
>>>               The IRQ does not happen on 2 CPUs and the hard IRQ is not re-entrant.
>>>               But still one SDIO IRQ is lost without this change.
>>>               After the ack, reading the SD_EMMC_STATUS BIT(15) is set, but
>>>               meson_mmc_irq() is never called again.
>>>
>>>               The fix is similar to Mediatek msdc_recheck_sdio_irq().
>>>               That platform also loses an IRQ in some cases it seems.
>>>
>>>  drivers/mmc/host/meson-gx-mmc.c | 16 ++++++++++++++++
>>>  1 file changed, 16 insertions(+)
>>>
>>> diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c
>>> index 6e5ea0213b47..7d3ee2f9a7f6 100644
>>> --- a/drivers/mmc/host/meson-gx-mmc.c
>>> +++ b/drivers/mmc/host/meson-gx-mmc.c
>>> @@ -1023,6 +1023,22 @@ static irqreturn_t meson_mmc_irq(int irq, void *dev_id)
>>>       if (ret == IRQ_HANDLED)
>>>               meson_mmc_request_done(host->mmc, cmd->mrq);
>>>
>>> +     /*
>>> +     * Sometimes after we ack all raised interrupts,
>>> +     * an IRQ_SDIO can still be pending, which can get lost.
>>> +     *
>>
>> A reader may scratch his head here and wonder how the interrupt can get lost,
>> and why adding a workaround instead of eliminating the root cause for losing
>> the interrupt. If you can't provide an explanation why the root cause for
>> losing the interrupt can't be fixed, presumably you would have to say that
>> you're adding a workaround for a suspected silicon bug.
> After talking to the manufacturer, we got the following explanation
> for this situation:

To which manufacturer did you talk, Marvell or Amlogic?

> "wifi may have dat1 interrupt coming in, without this the dat1
> interrupt would be missed"

I don't understand this sentence. W/o the interrupt coming in
the interrupt would be missed? Can you explain it?

> Supposedly this is fixed in their codebase.

Which codebase do you mean? A specific vendor driver? Or firmware?

> Unfortunately we were not able to find out more and can't prepare a
> patch with a proper explanation.

The workaround is rather simple, so we should add it.
It's just unfortunate that we have no idea about the root cause of the issue.

> Thank you for reviewing.
>>
>>> +     * To prevent this, recheck the IRQ_SDIO here and schedule
>>> +     * it to be processed.
>>> +     */
>>> +     raw_status = readl(host->regs + SD_EMMC_STATUS);
>>> +     status = raw_status & (IRQ_EN_MASK | IRQ_SDIO);
>>
>> This isn't needed here. Why not simply:
>>
>> status = readl(host->regs + SD_EMMC_STATUS);
>> if (status & IRQ_SDIO)
>>   ...
>>
>>
>>> +     if (status & IRQ_SDIO) {
>>> +             spin_lock(&host->lock);
>>> +             __meson_mmc_enable_sdio_irq(host->mmc, 0);
>>> +             sdio_signal_irq(host->mmc);
>>> +             spin_unlock(&host->lock);
>>> +     }
>>> +
>>>       return ret;
>>>  }
>>>
>>