linux-kernel - Re: [BUG] mmc: core: adjust polling interval for CMD1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFrz_2Vp64SUzB8CiHJLTjO8Hx8m3QEhY1VU2ksZhVEx7A@mail.gmail.com>
Date:   Tue, 1 Mar 2022 14:38:35 +0100
From:   Ulf Hansson <ulf.hansson@...aro.org>
To:     "H. Nikolaus Schaller" <hns@...delico.com>
Cc:     Huijin Park <huijin.park@...sung.com>,
        Jean Rene Dawin <jdawin@...h.uni-bielefeld.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Discussions about the Letux Kernel 
        <letux-kernel@...nphoenux.org>,
        Linux-OMAP <linux-omap@...r.kernel.org>,
        linux-mmc@...r.kernel.org, Tony Lindgren <tony@...mide.com>
Subject: Re: [BUG] mmc: core: adjust polling interval for CMD1

On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@...delico.com> wrote:
>
> Hi,
> Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
> with our disto kernel based on v5.17-rc1:
>
> >> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
> >> Black, but didn't check the logs.
> >> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
> >> traffic there were messages like
> >>
> >> [  662.529584] mmc1: error -110 doing runtime resume
> >> [  669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
> >>
> >> [  739.076072] mmc1: Timeout waiting for hardware interrupt.
> >> [  739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> >> [  739.231053] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00003101
> >> [  739.316472] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
> >> [  739.401937] mmc1: sdhci: Argument:  0x00342d30 | Trn mode: 0x00000023
> >> [  739.487439] mmc1: sdhci: Present:   0x01f70000 | Host ctl: 0x00000000
> >> [  739.573007] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
> >> [  739.658609] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00003c07
> >> [  739.744224] mmc1: sdhci: Timeout:   0x00000007 | Int stat: 0x00000002
> >> [  739.829896] mmc1: sdhci: Int enab:  0x027f000b | Sig enab: 0x027f000b
> >> [  739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> >> [  740.001394] mmc1: sdhci: Caps:      0x07e10080 | Caps_1:   0x00000000
> >> [  740.087208] mmc1: sdhci: Cmd:       0x0000193a | Max curr: 0x00000000
> >> [  740.173051] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
> >> [  740.258928] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
> >> [  740.344854] mmc1: sdhci: Host ctl2: 0x00000000
> >> [  740.402796] mmc1: sdhci: ============================================
> >>
> >> and finally IO errors and a corrupted filesystem.
> >>
> >> 5.17.0-rc4-letux+ shows the same behaviour.
>
> I checked with my BeagleBoard Black (am3358) and can confirm this observation.
> It happens only with the integrated eMMC but not with the µSD connected to
> the other mmc interface.
>
> A git bisect found:
>
> 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
> commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> Author: Huijin Park <huijin.park@...sung.com>
> Date:   Thu Nov 4 15:32:31 2021 +0900
>
>   mmc: core: adjust polling interval for CMD1
>
>   In mmc_send_op_cond(), loops are continuously performed at the same
>   interval of 10 ms.  However the behaviour is not good for some eMMC
>   which can be out from a busy state earlier than 10 ms if normal.
>
>   Rather than fixing about the interval time in mmc_send_op_cond(),
>   let's instead convert into using the common __mmc_poll_for_busy().
>
>   The reason for adjusting the interval time is that it is important
>   to reduce the eMMC initialization time, especially in devices that
>   use eMMC as rootfs.
>
>   Test log(eMMC:KLM8G1GETF-B041):
>
>   before: 12 ms (0.311016 - 0.298729)
>   [    0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
>   [    0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
>   [    0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
>   [    0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007
>
>   after: 2 ms (0.301270 - 0.298762)
>   [    0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
>   [    0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
>   [    0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
>   [    0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007
>
>   Signed-off-by: Huijin Park <huijin.park@...sung.com>
>   Link: https://lore.kernel.org/r/20211104063231.2115-3-huijin.park@samsung.com
>   Signed-off-by: Ulf Hansson <ulf.hansson@...aro.org>
>
> Reverting this makes v5.17-rc[1-4] work.
>
> Any suggestions or fixes?
>
> BR and thanks,
> Nikolaus Schaller
>
> Reported-by: jdawin@...h.uni-bielefeld.de
>

Thanks for reporting and bisecting!

What changed with the offending commit is two things:

1) We are sending the CMD1 more frequently, initially in the loop in
the __mmc_poll_for_busy. Step by step, we increase the polling period.
2) We may end up using a slightly shorter total timeout for polling
time, compared to what we used before the offending commit.

Hopefully the problem is related to 2), in which case I think the
below patch should help. Can you please give it a try?

Kind regards
Uffe

From: Ulf Hansson <ulf.hansson@...aro.org>
Date: Tue, 1 Mar 2022 14:24:21 +0100
Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND

It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
a bit too small due to recent changes. Therefore, let's extend it to 2s,
which is probably more inline with its previous value, to fix the reported
timeout problems.

While at it, let's add a define for the timeout value, rather than using
a hard-coded value for it.

Reported-by: Jean Rene Dawin <jdawin@...h.uni-bielefeld.de>
Reported-by: H. Nikolaus Schaller <hns@...delico.com>
Cc: Huijin Park <huijin.park@...sung.com>
Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
Signed-off-by: Ulf Hansson <ulf.hansson@...aro.org>
---
 drivers/mmc/core/mmc_ops.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index d63d1c735335..1f57174b3cf3 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -21,6 +21,7 @@

 #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
 #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
+#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */

 static const u8 tuning_blk_pattern_4bit[] = {
        0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
@@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;

-       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
+       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
+                                 &__mmc_send_op_cond_cb, &cb_data);
        if (err)
                return err;

-- 
2.25.1