[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c4aa0d2-d8e9-416b-b2ad-f5c3c8ea33de@gmail.com>
Date: Thu, 12 Jan 2023 22:27:44 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Peter Suti <peter.suti@...eamunlimited.com>
Cc: Ulf Hansson <ulf.hansson@...aro.org>,
Neil Armstrong <neil.armstrong@...aro.org>,
Kevin Hilman <khilman@...libre.com>,
Jerome Brunet <jbrunet@...libre.com>,
Martin Blumenstingl <martin.blumenstingl@...glemail.com>,
Matthias Brugger <matthias.bgg@...il.com>,
linux-mmc@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-amlogic@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH v3] mmc: meson-gx: fix SDIO interrupt handling
On 09.01.2023 12:52, Peter Suti wrote:
> On Wed, Dec 14, 2022 at 10:33 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
>>
>> On 14.12.2022 14:46, Peter Suti wrote:
>>> With the interrupt support introduced in commit 066ecde sometimes the
>>> Marvell-8987 wifi chip got stuck using the marvell-sd-uapsta-8987
>>> vendor driver. The cause seems to be that after sending ack to all interrupts
>>> the IRQ_SDIO still happens, but it is ignored.
>>>
>>> To work around this, recheck the IRQ_SDIO after meson_mmc_request_done().
>>>
>>> Inspired by 9e2582e ("mmc: mediatek: fix SDIO irq issue") which used a
>>> similar fix to handle lost interrupts.
>>>
>> The commit description of the referenced fix isn't clear with regard to
>> who's fault it is that an interrupt can be lost. I'd interpret it being
>> a silicon bug rather than a kernel/driver bug.
> Unfortunately I can't confirm that the referenced bug is in the
> silicon for the original commit either.
> However a similar workaround works in this case too which is why I
> referenced that commit.
>
>> Not sure whether it's the case, but it's possible that both vendors use
>> at least parts of the same IP in the MMC block, and therefore the issue
>> pops up here too.
>>
>>> Fixes: 066ecde ("mmc: meson-gx: add SDIO interrupt support")
>>>
>>> Signed-off-by: Peter Suti <peter.suti@...eamunlimited.com>
>>> ---
>>> Changes in v2:
>>> - use spin_lock instead of spin_lock_irqsave
>>> - only reenable interrupts if they were enabled already
>>>
>>> Changes in v3:
>>> - Rework the patch based on feedback from Heiner Kallweit.
>>> The IRQ does not happen on 2 CPUs and the hard IRQ is not re-entrant.
>>> But still one SDIO IRQ is lost without this change.
>>> After the ack, reading the SD_EMMC_STATUS BIT(15) is set, but
>>> meson_mmc_irq() is never called again.
>>>
>>> The fix is similar to Mediatek msdc_recheck_sdio_irq().
>>> That platform also loses an IRQ in some cases it seems.
>>>
>>> drivers/mmc/host/meson-gx-mmc.c | 16 ++++++++++++++++
>>> 1 file changed, 16 insertions(+)
>>>
>>> diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c
>>> index 6e5ea0213b47..7d3ee2f9a7f6 100644
>>> --- a/drivers/mmc/host/meson-gx-mmc.c
>>> +++ b/drivers/mmc/host/meson-gx-mmc.c
>>> @@ -1023,6 +1023,22 @@ static irqreturn_t meson_mmc_irq(int irq, void *dev_id)
>>> if (ret == IRQ_HANDLED)
>>> meson_mmc_request_done(host->mmc, cmd->mrq);
>>>
>>> + /*
>>> + * Sometimes after we ack all raised interrupts,
>>> + * an IRQ_SDIO can still be pending, which can get lost.
>>> + *
>>
>> A reader may scratch his head here and wonder how the interrupt can get lost,
>> and why adding a workaround instead of eliminating the root cause for losing
>> the interrupt. If you can't provide an explanation why the root cause for
>> losing the interrupt can't be fixed, presumably you would have to say that
>> you're adding a workaround for a suspected silicon bug.
> After talking to the manufacturer, we got the following explanation
> for this situation:
To which manufacturer did you talk, Marvell or Amlogic?
> "wifi may have dat1 interrupt coming in, without this the dat1
> interrupt would be missed"
I don't understand this sentence. W/o the interrupt coming in
the interrupt would be missed? Can you explain it?
> Supposedly this is fixed in their codebase.
Which codebase do you mean? A specific vendor driver? Or firmware?
> Unfortunately we were not able to find out more and can't prepare a
> patch with a proper explanation.
The workaround is rather simple, so we should add it.
It's just unfortunate that we have no idea about the root cause of the issue.
> Thank you for reviewing.
>>
>>> + * To prevent this, recheck the IRQ_SDIO here and schedule
>>> + * it to be processed.
>>> + */
>>> + raw_status = readl(host->regs + SD_EMMC_STATUS);
>>> + status = raw_status & (IRQ_EN_MASK | IRQ_SDIO);
>>
>> This isn't needed here. Why not simply:
>>
>> status = readl(host->regs + SD_EMMC_STATUS);
>> if (status & IRQ_SDIO)
>> ...
>>
>>
>>> + if (status & IRQ_SDIO) {
>>> + spin_lock(&host->lock);
>>> + __meson_mmc_enable_sdio_irq(host->mmc, 0);
>>> + sdio_signal_irq(host->mmc);
>>> + spin_unlock(&host->lock);
>>> + }
>>> +
>>> return ret;
>>> }
>>>
>>
Powered by blists - more mailing lists