linux-kernel - Re: eMMC boot problem: switch to bus width 8 ddr failed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99a8c8a0-1875-a4db-0c09-384dfed23234@rock-chips.com>
Date:   Mon, 9 Jan 2017 15:33:27 +0800
From:   Shawn Lin <shawn.lin@...k-chips.com>
To:     Clemens Gruber <clemens.gruber@...ruber.com>,
        linux-mmc@...r.kernel.org
Cc:     Ulf Hansson <ulf.hansson@...aro.org>,
        Linus Walleij <linus.walleij@...aro.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Dong Aisheng <aisheng.dong@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: eMMC boot problem: switch to bus width 8 ddr failed

On 2017/1/7 0:07, Clemens Gruber wrote:
> On Fri, Jan 06, 2017 at 10:54:35AM +0800, Shawn Lin wrote:
>> On 2017/1/6 8:41, Clemens Gruber wrote:
>>> Hi,
>>>
>>> with the current mainline 4.10-rc2 kernel, I can no longer boot from
>>> the eMMC on my i.MX6Q board.
>>>
>>> Details:
>>> The eMMC is a Micron MTFC4GACAJCN-1M WT but as the i.MX6Q only supports
>>> eMMC 4.41 features and we did not implement voltage switching from 3.3V
>>> to 1.8V or lower, I did add no-1-8-v; (but none of the mmc-ddr or mmc-hs
>>> options) to the device tree. The bus-width is 8.
>>>
>>> With 4.9 the board booted fine, now with the current mainline 4.10 tree,
>>> I get the following (repeating) errors at boot:
>>>
>>> [    4.326834] Waiting for root device /dev/mmcblk0p2...
>>> [   14.563861] mmc0: Timeout waiting for hardware cmd interrupt.
>>> [   14.569619] sdhci: =========== REGISTER DUMP (mmc0)===========
>>> [   14.575461] sdhci: Sys addr: 0x4e726000 | Version:  0x00000002
>>> [   14.581300] sdhci: Blk size: 0x00000200 | Blk cnt:  0x00000001
>>> [   14.587140] sdhci: Argument: 0x00010000 | Trn mode: 0x00000013
>>> [   14.592979] sdhci: Present:  0x01fd8009 | Host ctl: 0x00000031
>>> [   14.598816] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
>>> [   14.604654] sdhci: Wake-up:  0x00000008 | Clock:    0x0000001f
>>> [   14.610493] sdhci: Timeout:  0x0000008f | Int stat: 0x00000000
>>> [   14.616332] sdhci: Int enab: 0x107f100b | Sig enab: 0x107f100b
>>> [   14.622168] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
>>> [   14.628007] sdhci: Caps:     0x07eb0000 | Caps_1:   0x0000a007
>>> [   14.633845] sdhci: Cmd:      0x00000d1a | Max curr: 0x00ffffff
>>> [   14.639682] sdhci: Host ctl2: 0x00000000
>>> [   14.643611] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x4e6f7208
>>> [   14.649447] sdhci: ===========================================
>>>
>>> This repeats a few times, then more information is shown at the bottom:
>>>
>>> [   86.893859] mmc0: Timeout waiting for hardware cmd interrupt.
>>> [   86.899615] sdhci: =========== REGISTER DUMP (mmc0)===========
>>> [   86.905453] sdhci: Sys addr: 0x00000000 | Version:  0x00000002
>>> [   86.911291] sdhci: Blk size: 0x00000200 | Blk cnt:  0x00000001
>>> [   86.917129] sdhci: Argument: 0x00010000 | Trn mode: 0x00000013
>>> [   86.922967] sdhci: Present:  0x01fd8009 | Host ctl: 0x00000031
>>> [   86.928804] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
>>> [   86.934642] sdhci: Wake-up:  0x00000008 | Clock:    0x0000001f
>>> [   86.940479] sdhci: Timeout:  0x0000008f | Int stat: 0x00000000
>>> [   86.946316] sdhci: Int enab: 0x107f100b | Sig enab: 0x107f100b
>>> [   86.952154] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
>>> [   86.957992] sdhci: Caps:     0x07eb0000 | Caps_1:   0x0000a007
>>> [   86.963830] sdhci: Cmd:      0x00000d1a | Max curr: 0x00ffffff
>>> [   86.969668] sdhci: Host ctl2: 0x00000000
>>> [   86.973596] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000
>>> [   86.979433] sdhci: ===========================================
>>> [   86.986356] mmc0: switch to bus width 8 ddr failed
>>> [   86.991163] mmc0: error -110 whilst initialising MMC card
>>> [   97.773859] mmc0: Timeout waiting for hardware cmd interrupt.
>>>
>>> --
>>>
>>> After looking through the latest commits to mmc/core, I found the
>>> culprit:
>>> Commit e173f8911f091fa50ccf8cc1fa316dd5569bc470 ("mmc: core: Update
>>> CMD13 polling policy when switch to HS DDR mode")
>>>
>>> Reverting it fixes the problem. But I am unsure if that's the right
>>> course of action?
>>>
>>> Feel free to send me patches for testing!
>>>
>>
>> I just look into both of sdhci and sdhci-esdhc-imx again, and seems the
>> code miss a bit, so could you also try this one?
>>
>> drivers/mmc/core/mmc_ops.c
>> @@ -486,7 +486,8 @@ static int mmc_poll_for_busy(struct mmc_card *card,
>> unsigned int timeout_ms,
>>                         busy = host->ops->card_busy(host);
>>                 } else {
>>                         err = mmc_send_status(card, &status);
>> -                       if (retry_crc_err && err == -EILSEQ) {
>> +                       if (retry_crc_err && (err == -EILSEQ ||
>> +                                               err == -ETIMEDOUT)) {
>>                                 busy = true;
>>                         } else if (err) {
>>                                 return err;
>>
>
> Hi,
>
> this patch (alone) does not solve the problem. The error message is the
> same as before.
>
> But applying both your first patch and this one does work. Is this one
> beneficial anyway, even if it does not fix my problem?

I think so. It always assumed that if the card was not ready after
finishing switching the mode, we should got a CRC, namely -EILSEQ, from
the hosts. But the fact is if the host is in higher speed mode but the
eMMC havn't finished the switch, so the host could fail to sample the
resp of CMD13 due to the mismatch timing in between. Could it is
possible that response timeout was generated instaed of -EILSEQ? It's
quite IP specificed. So I don't think we should take the risk of relying
that. In another word, we don't expect to bail out early for any errors
bounced from hosts when polling the status, no just for explicit CRC.




>
> Regards,
> Clemens
>
>
>


-- 
Best Regards
Shawn Lin