linux-kernel - Re: [PATCH v2 2/2] mmc: core: fall back host->f_init if failing to init mmc card after resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <73b58bd5-a5a2-a436-58a9-30fa8db3225f@rock-chips.com>
Date:	Wed, 3 Aug 2016 09:35:55 +0800
From:	Shawn Lin <shawn.lin@...k-chips.com>
To:	Jaehoon Chung <jh80.chung@...sung.com>,
	Ulf Hansson <ulf.hansson@...aro.org>
Cc:	shawn.lin@...k-chips.com, Adrian Hunter <adrian.hunter@...el.com>,
	linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org,
	Doug Anderson <dianders@...omium.org>,
	linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH v2 2/2] mmc: core: fall back host->f_init if failing to
 init mmc card after resume

Hi Jaehoon,

在 2016/8/2 18:47, Jaehoon Chung 写道:
> Hi Shawn,
>
> On 08/02/2016 06:07 PM, Shawn Lin wrote:
>> Hi Ulf,
>>
>> 在 2016/7/20 9:57, Shawn Lin 写道:
>>> We observed the failure of initializing card after resume
>>> accidentally. It's hard to reproduce but we did get report from
>>> the suspend/resume test of our RK3399 mp test farm . Unfortunately,
>>> we still fail to figure out what was going wrong at that time.
>>> Also we can't achieve it by retrying the host->f_init without falling
>>> back it. But this patch will solve the problem as we could add some log
>>> there and see that we resume the mmc card successfully after falling
>>> back the host->f_init. There is no obvious side effect found, so it seems
>>> this patch will improve the stability.
>>>
>>> [   93.405085] mmc1: unexpected status 0x800900 after switch
>>> [   93.408474] mmc1: switch to bus width 1 failed
>>> [   93.408482] mmc1: mmc_select_hs200 failed, error -110
>>> [   93.408492] mmc1: error -110 during resume (card was removed?)
>>> [   93.408705] PM: resume of devices complete after 213.453 msecs
>>>
>
> Status 0x800900 is COM_CRC_ERROR..it seems that CRC check fails.
> But i don't know what is related with "fall back host->f_init".

Yup, actually it also looks strange to me that we should downgrade
the host->f_init when resuming. CRC error shouldn't occour as 400K
could work at booting time, also we could see the HS400 work normally
later which make me believe that it shouldn't belong to signal problem,
but we need to figure out why the controller think it should be a CRC
error.

The best way is to make it easy to be reproduced that we could check the
pcb signal there, and I still try it then. Or there is a HW/Chip
condition that make my emmc PHY work improperly accidentally. Anyway
more proof should be provided before I'am able to land patch to
fix/avoid the root cause. I'm doing it..

>
> I don't have a knowledge of rockchip...
> but in my experience, there are some cases, not mmc core problem..
>
> 1. Exynos is using the gpio as clk/cmd/data line..and gpio has the driver strength value.
> If driver strength is changed after resuming, it's possible to occur the error.

Yes, the related settings or configuration for PHY didn't change.

>
> 2. And glitch for I/O line..this loop has the delay..Just delay?

We have retryied 400K if failing to resume and will not break out if
still finding failure, but it doesn't help.


>
> So you can check the other problem... :)
>
> At Booting time, f_init can use 400K..but after resuming..f_init need to use 100K..hmm..strange..
>

Agreed..

So let's come back to the topic -- Should we support downgrading f_init
after failing to resume just as what we do at the booting time? It's
possible that the enviroment changes like(noise, temperature, static)
will lead to the failure after resuming. Shouldn't the mechanism be more
robust to deal with these unexpected cases?  :)


> Best Regards,
> Jaehoon Chung
>
>>
>> Any comments for this patch? :)
>>
>>> Signed-off-by: Shawn Lin <shawn.lin@...k-chips.com>
>>>
>>> ---
>>>
>>> Changes in v2:
>>> - remove mmc_power_off
>>> - take f_min into consideration
>>>
>>>  drivers/mmc/core/mmc.c | 19 +++++++++++++++++--
>>>  1 file changed, 17 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
>>> index 403b97b..a2891c1 100644
>>> --- a/drivers/mmc/core/mmc.c
>>> +++ b/drivers/mmc/core/mmc.c
>>> @@ -1945,6 +1945,7 @@ static int mmc_suspend(struct mmc_host *host)
>>>  static int _mmc_resume(struct mmc_host *host)
>>>  {
>>>      int err = 0;
>>> +    int i;
>>>
>>>      BUG_ON(!host);
>>>      BUG_ON(!host->card);
>>> @@ -1954,8 +1955,22 @@ static int _mmc_resume(struct mmc_host *host)
>>>      if (!mmc_card_suspended(host->card))
>>>          goto out;
>>>
>>> -    mmc_power_up(host, host->card->ocr);
>>> -    err = mmc_init_card(host, host->card->ocr, host->card);
>>> +    /*
>>> +     * Let's try to fallback the host->f_init
>>> +     * if failing to init mmc card after resume.
>>> +     */
>>> +    for (i = 0; i < ARRAY_SIZE(freqs); i++) {
>>> +        if (host->f_init < max(freqs[i], host->f_min))
>>> +            continue;
>>> +        else
>>> +            host->f_init = max(freqs[i], host->f_min);
>>> +
>>> +        mmc_power_up(host, host->card->ocr);
>>> +        err = mmc_init_card(host, host->card->ocr, host->card);
>>> +        if (!err)
>>> +            break;
>>> +    }
>>> +
>>>      mmc_card_clr_suspended(host->card);
>>>
>>>  out:
>>>
>>
>>
>
>
>
>


-- 
Best Regards
Shawn Lin