linux-kernel - Re: [PATCH] mmc: dw_mmc: Wait for data transfer after response errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFqH_51=a7qsLFv_v3uDTsQzz8YD=GiAo3SUcR6rW_MObm=M7Q@mail.gmail.com>
Date:	Thu, 24 Mar 2016 12:26:43 +0100
From:	Enric Balletbo Serra <eballetbo@...il.com>
To:	Doug Anderson <dianders@...omium.org>
Cc:	"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Alim Akhtar <alim.akhtar@...il.com>,
	Jaehoon Chung <jh80.chung@...sung.com>,
	Ulf Hansson <ulf.hansson@...aro.org>,
	Alim Akhtar <alim.akhtar@...sung.com>,
	Sonny Rao <sonnyrao@...omium.org>,
	Andrew Bresticker <abrestic@...omium.org>,
	Heiko Stuebner <heiko@...ech.de>,
	Addy Ke <addy.ke@...k-chips.com>,
	Alexandru Stan <amstan@...omium.org>,
	Chris Zhong <zyw@...k-chips.com>,
	Caesar Wang <wxt@...k-chips.com>,
	Javier Martinez Canillas <javier@....samsung.com>,
	Russell King <linux@....linux.org.uk>
Subject: Re: [PATCH] mmc: dw_mmc: Wait for data transfer after response errors

I fixed Javier Martinez email and removed tgih.jun@...sung.com (delivery fail)
Also cc'ing Russell King as I think might help (see my comment below)


2016-03-21 23:38 GMT+01:00 Doug Anderson <dianders@...omium.org>:
> Enric,
>
> On Thu, Mar 17, 2016 at 5:12 AM, Enric Balletbo Serra
> <eballetbo@...il.com> wrote:
>> Dear all,
>>
>> Seems the following thread[1] didn't go anywhere. I'd like to continue
>> the discussion and share some tests that I did regarding the issue
>> that the patch is trying to fix.
>>
>> First I reproduced the issue on my rockchip board and I tested the
>> patch intensively, I can confirm that the patch made by Doug fixes the
>> issue.But, as reported by Alim, seems that this patch has the side
>> effect that breaks mmc on peach-pi board [2], specially on
>> suspend/resume, I ran lots of tests on peach-pi and, although is a bit
>> random, I can also confirm the breakage.
>>
>> Looks like that on peach-pi, when the patch is applied the controller
>> moves into a data transfer and the interrupt does not come, then the
>> task blocks. The reason why I think the dw_mmc-rockchip driver works
>> is because it has the DW_MCI_QUIRK_BROKEN_DTO quirk [3].
>>
>> So I did lots of tests on peach-pi with dto quirk, suspend/resume
>> started to work again. But I guess this is not the proper solution or
>> it is? Thoughts?
>>
>> [1] https://lkml.org/lkml/2015/5/18/495
>> [2] https://lava.collabora.co.uk/scheduler/job/169384/log_file#L_195_5
>> [3] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mmc/host/dw_mmc-rockchip.c?id=57e104864bc4874a36796fd222d8d084dbf90b9b
>
> Ah, that would make some sense why things work OK on Rockchip.  Adding
> DW_MCI_QUIRK_BROKEN_DTO to peach probably doesn't make sense, then.
> Hrm...
>
> Since my original debugging of the issue was over a year ago, I think
> I've almost totally lost context of any debugging I did on the issue,
> so I'm not sure I'm going to be too much help in giving any details
> other than what I put in the original commit message.  From the
> original message it appears that I thought we could solve this other
> ways but just that my patch was easier than the alternative of
> handling every error case.  Maybe we just need to go back to the
> drawing board and handle the error directly?
>

I just saw that Russell introduced a patch [1] that will land on 4.6.
I think that patch solves the same issue that we're trying to fix, but
for sdhci controller.

The problem that we have on peach-pi, with our patch applied, is that
when we get a response CRC error on a command and we move to start
sending data, the transfer doesn't receives a timeout interrupt (I
don't know why). As I told, on rockchip works due the DTO quirk.
exynos is not using this quirk. Also, please correct me if I'm wrong,
looks like the sdhci controller has a timer to signal the command
timed out.

ooi, anyone knows what was the test case that caused the necessity of
the DTO quirk?

> Also: my original commit message says "response error or response CRC
> error".  Do you happen to know which of these two we're hitting on
> rk3288?  If we limit the workaround to just one of these two cases
> does peach pi still break?
>

Yes, the peach pi still break. The one that is hitting is the response
CRC error, so limit the workaround doesn't help.


> Also: I'd be curious, with the same SD card can you reproduce any
> failures on peach pi?  ...or does peach-pi work fine in this case?
>

I can't test this now because I don't have physical access to the
peach-pi. But yeah, this is something to test.

> Hmm, also I think my last suggestion was to see how things looked with
> <https://chromium-review.googlesource.com/#/c/244347/> picked to get
> extra debug info...
>
>
> -Doug

[1] https://git.linaro.org/people/ulf.hansson/mmc.git/commit/71fcbda0fcddd0896c4982a484f6c8aa802d28b1

Enric