[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56FC83C2.3000703@rock-chips.com>
Date: Thu, 31 Mar 2016 09:56:18 +0800
From: Shawn Lin <shawn.lin@...k-chips.com>
To: Russell King - ARM Linux <linux@....linux.org.uk>,
Enric Balletbo Serra <eballetbo@...il.com>
Cc: shawn.lin@...k-chips.com, Doug Anderson <dianders@...omium.org>,
"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Alim Akhtar <alim.akhtar@...il.com>,
Jaehoon Chung <jh80.chung@...sung.com>,
Ulf Hansson <ulf.hansson@...aro.org>,
Alim Akhtar <alim.akhtar@...sung.com>,
Sonny Rao <sonnyrao@...omium.org>,
Andrew Bresticker <abrestic@...omium.org>,
Heiko Stuebner <heiko@...ech.de>,
Addy Ke <addy.ke@...k-chips.com>,
Alexandru Stan <amstan@...omium.org>,
Chris Zhong <zyw@...k-chips.com>,
Caesar Wang <wxt@...k-chips.com>,
Javier Martinez Canillas <javier@....samsung.com>
Subject: Re: [PATCH] mmc: dw_mmc: Wait for data transfer after response errors
在 2016/3/31 1:26, Russell King - ARM Linux 写道:
> On Wed, Mar 30, 2016 at 07:16:18PM +0200, Enric Balletbo Serra wrote:
>> 2016-03-24 17:22 GMT+01:00 Russell King - ARM Linux <linux@....linux.org.uk>:
>>> On Thu, Mar 24, 2016 at 09:06:45AM -0700, Doug Anderson wrote:
>>>> Russell,
>>> ...
>>>> Presumably this is similar to what you saw: the host saw the CRC error
>>>> but the card knew nothing about it. Sending the stop command during
>>>> this time confused the card. Presumably the card was in transfer
>>>> state during this time?
>>>
>>> If the card was in transfer state for a command which expects a stop
>>> command, and that stop command was issued after the card entered
>>> the transfer state, then I'd expect the card to handle it... though
>>> there's always the firmware bug issue.
>>>
>>> If the card hadn't entered transfer state at the time the stop command
>>> was issued.. I think that's more likely to hit card firmware issues.
>>>
>>> With the tuning commands, there's another case you can hit though:
>>> the data transfer may have completed before you get around to sending
>>> the stop command.
>>>
>>> That's why, for sdhci, I came to the conclusion that waiting for the
>>> data transfer to complete or timeout was the best solution for SDHCI.
>>>
>>
>> In fact I only saw the problem with dw_mmc-exynos, on dw_mmc-rockchip
>> it doesn't happen because it enables the DW_MCI_QUIRK_BROKEN_DTO
>> behaviour. What does this is use a kernel timer to signal when DTO
>> interrupt does NOT come. Note that if I disable this quirk I can also
>> saw the problem on rockchip.
>>
>>> Maybe, if sending a STOP command does cause card firmware issues, then:
>>>
>>> 1) it provides evidence that trying to send a stop command on response
>>> CRC error is the wrong thing to do (it was talked about making SDHCI
>>> do this.)
>>>
>>
>> Seems the same here, so guess is the wrong thing to do.
>>
>>> 2) it suggests that the solution I came up with for SDHCI is the better
>>> solution, rather than trying to immediately recover the situation by
>>> sending a STOP command.
>>>
>>
>> I'm wondering if just enable this quirk on exynos too is the proper
>> solution. Unfortunately I don't have enough documentation to check
>> differences between those controllers.
>> Also will really help have access to some hardware that uses
>> dw_mmc-pltfm to check if, like on exynos, same issue is triggered.
>> Anyone with the hardware who can do some tests?
>
> I'd really suggest that the dw-mmc folk place a moritorium on quirk
> flags, and instead deal with situations like this without resorting
> to this kind of thing.
>
Some quirks and some callbacks have been cleaned in Jaehoon's repo,and
still some are going to removed. Finally we do plan to turn dw_mmc core
into a pure library..
> sdhci is a good example why the quirk flag approach is totally wrong,
> and shows that it leads to an unmaintainable mess. If dw-mmc people
> don't want the driver to decend into the same state that sdhci is,
> then things like this should not be quirks. sdhci already has a
> long-term moritorium on quirk flags until the resulting mess has been
> cleaned up.
>
> The danger that quirk flags cause is also highlighted in your mail:
> it's very likely that this _isn't_ a host controller issue at all,
Two issues found by dw_mmc-rockchip part,
(1) need reset idma when switching between fifo-transfer and
idma-transfer. When biu:ciu > 1:6, idma internal fsm take a risk of
a race condition to maintain its fifo lookup pointer. It can be very
easy reproduce by seting biu:ciu > 1:6.. Common bug for dw_mmc! But
unfortunately these details was missing in the commit msg.
(2) Missing DTO/DRTO; I missed the thread for this topic, so I need to
reproduce it by setting a simple C model code. I can't say more
currently until we can find a way to easily reproduce it. But I guess
it's NOT a host issue....since I slightly glance at the TMOUT reg at
dw_mmc databook and find a software timer requirement:
31:8 data_timeout 0xffffff
Value for card Data Read Timeout; same value also used for Data
Starvation by Host timeout. The timeout counter is started only after
thecard clock is stopped. Value is in number of card output clocks –
cclk_out of selected card.
Note: The software timer should be used if the timeout value is in the
order of 100 ms. In this case, read data timeout interrupt needs to be
disabled.
> but a MMC protocol issue or a card issue - and the behaviour required
> here is not specific to any particular host controller. The problem
> with having a quirk flag for it is that you end up with some hosts
> enabling it, and other hosts having it disabled only because they
> haven't yet tripped over the issue.
>
--
Best Regards
Shawn Lin
Powered by blists - more mailing lists