linux-kernel - Re: [PATCH 1/2] mmc: mmci: enable MMC_CAP_NEED_RSP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc4a9030-ba15-3236-7d81-52eb166a998a@foss.st.com>
Date:   Mon, 8 Feb 2021 14:31:05 +0100
From:   Yann GAUTIER <yann.gautier@...s.st.com>
To:     Ulf Hansson <ulf.hansson@...aro.org>
CC:     Russell King <linux@...linux.org.uk>,
        Linus Walleij <linus.walleij@...aro.org>,
        <ludovic.barre@...s.st.com>,
        Marek Vašut <marex@...x.de>,
        "linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] mmc: mmci: enable MMC_CAP_NEED_RSP_BUSY

On 2/8/21 1:16 PM, Yann GAUTIER wrote:
> On 2/5/21 1:19 PM, Yann GAUTIER wrote:
>> On 2/5/21 10:53 AM, Ulf Hansson wrote:
>>> - trimmed cc-list
>>>
>>> On Thu, 4 Feb 2021 at 13:08, <yann.gautier@...s.st.com> wrote:
>>>>
>>>> From: Yann Gautier <yann.gautier@...s.st.com>
>>>>
>>>> To properly manage commands awaiting R1B responses, the capability
>>>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
>>>> manage busy detection.
>>>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
>>>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
>>>
>>> Would it be possible for you to share a little bit more about the
>>> problem? Like under what circumstances does things screw up?
>>>
>>> Is the issue only occurring when the cmd->busy_timeout becomes larger
>>> than host->max_busy_timeout. Or even in other cases?
>>>
>>>>
>>>> Signed-off-by: Yann Gautier <yann.gautier@...s.st.com>
>>>> ---
>>>>   drivers/mmc/host/mmci.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
>>>> index 1bc674577ff9..bf6971fdd1a6 100644
>>>> --- a/drivers/mmc/host/mmci.c
>>>> +++ b/drivers/mmc/host/mmci.c
>>>> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
>>>>                  if (variant->busy_dpsm_flag)
>>>>                          mmci_write_datactrlreg(host,
>>>> host->variant->busy_dpsm_flag);
>>>> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
>>>> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY | 
>>>> MMC_CAP_NEED_RSP_BUSY;
>>>
>>> This isn't correct as the ux500 (and likely also other legacy
>>> variants) don't need this. I have tried it in the past and it works
>>> fine for ux500 without MMC_CAP_NEED_RSP_BUSY.
>>>
>>> The difference is rather that the busy detection for stm32 variants
>>> needs a corresponding HW busy timeout to be set (its
>>> variant->busy_timeout flag is set). Perhaps we can use that
>>> information instead?
>>>
>>> Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
>>> not be set by the core for erase commands, CMD5 and CMD6.
>>>
>>> By looking at the code in mmci_start_command(), it looks like we will
>>> default to a timeout of 10s, when cmd->busy_timeout isn't set. At
>>> least for some erase requests, that won't be sufficient. Would it be
>>> possible to disable the HW busy timeout in some way - and maybe use a
>>> software timeout instead? Maybe I already asked Ludovic about this?
>>> :-)
>>>
>>> BTW, did you check that the MMCIDATATIMER does get the correct value
>>> set for the timer in mmci_start_command() and if
>>> host->max_busy_timeout gets correctly set in
>>> mmci_set_max_busy_timeout()?
>>>
>>> [...]
>>>
>>> Kind regards
>>> Uffe
>>>
>>
>> Hi Ulf,
>>
>> Thanks for the hints.
>> I'll check all of that and get back with updated patches.
>>
>> As I tried to explain in the cover letter and in reply to Adrian, I saw
>> a freeze (BUSYD0) in test 37 during MMC_ERASE command  with 
>> SECURE_ERASE_ARG, when running this test just after test 36 (or any 
>> other write test). But maybe, as you said that's mostly a incorrect 
>> timeout issue.
>>
>> Regards,
>> Yann
> 
> Hi,
> 
> I made some extra tests, and the timeout value set in MMCIDATATIMER 
> correspond to the one computed:
> card->ext_csd.erase_group_def is set to 1 in mmc_init_card()
> In mmc_mmc_erase_timeout(), we have:
> erase_timeout = card->ext_csd.hc_erase_timeout; // 300ms * 0x07 (for the 
> eMMC card I have: THGBMDG5D1LBAIL
> erase_timeout *= card->ext_csd.sec_erase_mult; // 0xDC
> erase_timeout *= qty; // 32 (from = 0x1d0000, to = 0x20ffff)
> 
> This leads to a timeout of 14784000ms (~4 hours).
> The max_busy_timeout is 86767ms.
> 
> After those 4 hours, I get this message:
> mmc1: Card stuck being busy! __mmc_poll_for_busy
> 
> The second erase with MMC_ERASE_ARG finds an erase timeout of 67200ms, 
> and uses R1B command.
> But as the first erase failed, the DPSMACT is still enabled, the busy 
> timeout doesn't seem to happen. Something may be missing in the error path.
> 
> Anyway, I'll push an update of the second patch of the series, and just 
> drop this first one.
> 
> 
> Regards,
> Yann

I've discussed with Ludovic, and it is somewhat related to this patch set:
https://patchwork.kernel.org/project/linux-mmc/list/?series=186219&state=%2A&archive=both

The STM32 SDMMC IP needs a specific reset procedure when a data timeout 
occurs. If it is hardware, this is managed with the threaded IRQ. But if 
it is a SW polling (if R1B is replaced with R1), there is nothing in 
frameworks that could call this "unstuck" procedure for STM32 variant.
I don't know how this should be handled.


Regards,
Yann