linux-kernel - Re: dw_mmc: HLE errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-id: <5653C47E.9030801@samsung.com>
Date:	Tue, 24 Nov 2015 10:59:26 +0900
From:	Jaehoon Chung <jh80.chung@...sung.com>
To:	Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@...aro.org>,
	Doug Anderson <dianders@...omium.org>
Cc:	Ulf Hansson <ulf.hansson@...aro.org>,
	Alim Akhtar <alim.akhtar@...sung.com>,
	Sonny Rao <sonnyrao@...omium.org>,
	Andrew Bresticker <abrestic@...omium.org>,
	Heiko Stübner <heiko@...ech.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
	Guodong Xu <guodong.xu@...aro.org>
Subject: Re: dw_mmc: HLE errors

On 11/24/2015 10:55 AM, Jorge Ramirez-Ortiz wrote:
> On 11/23/2015 07:11 PM, Jaehoon Chung wrote:
>> Dear, Jorge.
>>
>> On 11/24/2015 02:29 AM, Jorge Ramirez-Ortiz wrote:
>>> On 11/23/2015 11:57 AM, Doug Anderson wrote:
>>>> Jorge,
>>>>
>>>> On Mon, Nov 23, 2015 at 6:10 AM, Jorge Ramirez-Ortiz
>>>> <jorge.ramirez-ortiz@...aro.org> wrote:
>>>>> Doug/Jaehoon,
>>>>>
>>>>> Were there any follow ups to this thread [1] from March 30, 2015?
>>>>> We are seeing HLE errors on 3.18 and we are trying to determine if a solution
>>>>> was ever delivered.
>>>>> On inspection, I can't find anything specific in recent kernels that address
>>>>> this particular issue (was the actual root cause identified?)
>>>>>
>>>>> I put together a possible work-around that avoids the HLE storm from occurring
>>>>> for this specific SoC [2].
>>>>> However we'd rather not merge this -or any other similar fix- if there is a
>>>>> generic solution already that we can pick up from mainline.
>>>> Nothing landed that I'm aware of.  Are you on SDIO, SD or eMMC?
>>>> Trying to do UHS?
>>> SD even without UHS (yet, that is coming now)
>> If you want to use the upper mode than UHS-DDR50 for SD-card, you need to apply the below patch.
> 
> ACK
> 
>>
>> https://patchwork.kernel.org/patch/7456121/
>>
>> Actually, this is not relevant to HLE error.
>>
>> When sd-card is inserted/removed quickly, then sometime dwmmc controller is occurred the HLE error.
>> (Now, i can't see HLE error.)
>> So i had applied the some reset processing at my official repository.(It's not generic solution.)
> 
> Thanks, I'll have a look now.
> 
> I believe this to be your official repo:
> https://github.com/jh80chung/dw-mmc
> 
> Please let me know if it is not.

Sorry. it's not official repo (Samsung). So i can't share URL. :(
It's just my personal git repository.  I will work on that repository.. :)

Best Regards,
Jaehoon Chung

> 
> 
>>
>>>> I know that this patch mattered for me for UHS:
>>>>
>>>>   7c5209c315ea mmc: core: Increase delay for voltage to stabilize from
>>>> 3.3V to 1.8V
>>>>
>>>>
>>>> Also important for UHS (for at least some folks) were patches like:
>>>>
>>>>   9c85f37a2984 mmc: core: Add mmc_regulator_set_vqmmc()
>>>>
>>>> ...that attempted to get voltages more proper...
>>> ack
>>>
>>>>
>>>> In the ChromeOS tree we did just land treating HLE errors as data and
>>>> cmd errors <https://patchwork.kernel.org/patch/5978711/>.  It's not
>>>> wonderful but it's better than letting an interrupt go off forever...
>>> Yes I did try this patch on 3.18 but it didn't seem to be enough for us.
>>> Even though it would prevent the interrupt storm from flooding the kernel, once
>>> the event triggered and the interrupt was handled no more card
>>> insertions/ejections would be detected.
>> If HLE error will be reproduce with the generic sequence, I think we can find the generic solution.
>> So could you explain to me in more detail? If i can reproduce with v3.18, i will try to test it.
>> Your case will be helpful to me for solving the HLE error.
> 
> 
> Yes, the issue is relatively easy to reproduce.
> 
> On this platform:
> https://www.96boards.org/products/ce/hikey/
> 
> Using either debian [1] or android [2] releases and the latest UEFI [3]
> [1] https://builds.96boards.org/snapshots/hikey/linaro/debian/379/
> [2] https://builds.96boards.org/snapshots/hikey/linaro/aosp/197/
> [3] https://builds.96boards.org/snapshots/hikey/linaro/uefi/89/
> 
> The kernel tree between android and debian is shared [4].
> We are using the "hikey" branch (v3.18)
> [4] https://github.com/96boards/linux
> 
> For my tests and to be able to handed the interrupt storm and monitor the
> registers while it happens, I patched the kernel with a Xenomai [5] co-kernel.
> This is my kernel tree [6]
> [5] http://xenomai.org/
> [6] http://git.xenomai.org/ipipe-jro.git/log/?h=hikey
> 
> To reproduce the problem all it was required was to insert/remove the SD card
> rapidly until it triggers this condition:
> [  229.974525] dwmmc_k3 f723e000.dwmmc1: Busy; trying anyway
>  
> When it triggered, and after patching the interrupt handler with some debug info
> to show the distance between interrupts and the content of the MINTSTS register,
> I could see the following:
> mci_isr: 0x1000,  3333 ns
> mci_isr: 0x1000,  3334 ns
> mci_isr: 0x1000,  3333 ns
> mci_isr: 0x1000,  3334 ns
> mci_isr: 0x1000,  3333 ns
> mci_isr: 0x1000,  2500 ns
> mci_isr: 0x1000,  3334 ns
> mci_isr: 0x1000,  2500 ns
> mci_isr: 0x1000,  3333 ns
> mci_isr: 0x1000,  3334 ns
> mci_isr: 0x1000,  3334 ns
> mci_isr: 0x1000,  3333 ns
> mci_isr: 0x1000,  3334 ns
> mci_isr: 0x1000,  2500 ns
> mci_isr: 0x1000,  3334 ns
> [...]
> 
> Notice that since the Xenomai co-kernel runs with a higher priority than the
> Linux kernel, I was able to output this information to the console.
> 
> I put together a fix based on this commit from Doug;
> mmc: dw_mmc: Don't start commands while busy
> https://lkml.org/lkml/2015/2/20/508
> 
> In Doug's commit, we would delay sending a command until the SDMCC_STATUS_BUSY
> cleared.
> However if it never cleared, we'd go ahead and submit the command anyway.
> 
> I believe this is what was causing the HLE to be raised.
> In order to prevent that from happening, I think we should abort the operation
> completely.
> My "extension" for the Hikey platform looks like this:
> https://github.com/96boards/linux/commit/fe8d7f714d420121cec460e69f6529044a2cb6d
> 
> It could be made generic or the fix could have some other form of course.
> I was only targeting the Hikey platform when I wrote this hoping that it would
> have been fixed upstream.
> 
> Having said all of this, I am not sure what would cause the host status to
> remain busy for so long (which is Ulf's biggest concern)
> I also tried increasing some of the timers that wait for the voltages to ramp up
> after power on but it didnt make any difference.
> 
> I captured most of the information above under this bug for reference.
> https://bugs.96boards.org/show_bug.cgi?id=175
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/