linux-kernel - Re: [PATCH V8 00/14] mmc: Add Command Queue support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b594f4e0-06ea-c9fc-fc2d-dd93bc401c6f@intel.com>
Date:   Thu, 19 Oct 2017 14:44:03 +0300
From:   Adrian Hunter <adrian.hunter@...el.com>
To:     Ulf Hansson <ulf.hansson@...aro.org>
Cc:     linux-mmc <linux-mmc@...r.kernel.org>,
        linux-block <linux-block@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Bough Chen <haibo.chen@....com>,
        Alex Lemberg <alex.lemberg@...disk.com>,
        Mateusz Nowak <mateusz.nowak@...el.com>,
        Yuliy Izrailov <Yuliy.Izrailov@...disk.com>,
        Jaehoon Chung <jh80.chung@...sung.com>,
        Dong Aisheng <dongas86@...il.com>,
        Das Asutosh <asutoshd@...eaurora.org>,
        Zhangfei Gao <zhangfei.gao@...il.com>,
        Sahitya Tummala <stummala@...eaurora.org>,
        Harjani Ritesh <riteshh@...eaurora.org>,
        Venu Byravarasu <vbyravarasu@...dia.com>,
        Linus Walleij <linus.walleij@...aro.org>,
        Shawn Lin <shawn.lin@...k-chips.com>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH V8 00/14] mmc: Add Command Queue support

On 18/10/17 09:16, Adrian Hunter wrote:
> On 11/10/17 16:58, Ulf Hansson wrote:
>> On 11 October 2017 at 14:58, Adrian Hunter <adrian.hunter@...el.com> wrote:
>>> On 11/10/17 15:13, Ulf Hansson wrote:
>>>> On 10 October 2017 at 15:31, Adrian Hunter <adrian.hunter@...el.com> wrote:
>>>>> On 10/10/17 16:08, Ulf Hansson wrote:
>>>>>> [...]
>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have also run some test on my ux500 board and enabling the blkmq
>>>>>>>>>> path via the new MMC Kconfig option. My idea was to run some iozone
>>>>>>>>>> comparisons between the legacy path and the new blkmq path, but I just
>>>>>>>>>> couldn't get to that point because of the following errors.
>>>>>>>>>>
>>>>>>>>>> I am using a Kingston 4GB SDHC card, which is detected and mounted
>>>>>>>>>> nicely. However, when I decide to do some writes to the card I get the
>>>>>>>>>> following errors.
>>>>>>>>>>
>>>>>>>>>> root@ME:/mnt/sdcard dd if=/dev/zero of=testfile bs=8192 count=5000 conv=fsync
>>>>>>>>>> [  463.714294] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  464.722656] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  466.081481] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  467.111236] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  468.669647] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  469.685699] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  471.043334] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  472.052337] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  473.342651] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  474.323760] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  475.544769] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  476.539031] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  477.748474] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>> [  478.724182] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>>>>
>>>>>>>>>> I haven't yet got the point of investigating this any further, and
>>>>>>>>>> unfortunate I have a busy schedule with traveling next week. I will do
>>>>>>>>>> my best to look into this as soon as I can.
>>>>>>>>>>
>>>>>>>>>> Perhaps you have some ideas?
>>>>>>>>>
>>>>>>>>> The behaviour depends on whether you have MMC_CAP_WAIT_WHILE_BUSY. Try
>>>>>>>>> changing that and see if it makes a difference.
>>>>>>>>
>>>>>>>> Yes, it does! I disabled MMC_CAP_WAIT_WHILE_BUSY (and its
>>>>>>>> corresponding code in mmci.c) and the errors goes away.
>>>>>>>>
>>>>>>>> When I use MMC_CAP_WAIT_WHILE_BUSY I get these problems:
>>>>>>>>
>>>>>>>> [  223.820983] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> [  224.815795] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> [  226.034881] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> [  227.112884] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> [  227.220275] mmc0: Card stuck in wrong state! mmcblk0 mmc_blk_card_stuck
>>>>>>>> [  228.686798] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> [  229.892150] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> [  231.031890] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> [  232.239013] mmci-pl18x 80126000.sdi0_per1: error during DMA transfer!
>>>>>>>> 5000+0 records in
>>>>>>>> 5000+0 records out
>>>>>>>> root@ME:/mnt/sdcard
>>>>>>>>
>>>>>>>> I looked at the new blkmq code from patch v10 13/15. It seems like the
>>>>>>>> MMC_CAP_WAIT_WHILE_BUSY is used to determine whether the async request
>>>>>>>> mechanism should be used or not. Perhaps I didn't looked close enough,
>>>>>>>> but maybe you could elaborate on why this seems to be the case!?
>>>>>>>
>>>>>>> MMC_CAP_WAIT_WHILE_BUSY is necessary because it means that a data transfer
>>>>>>> request has finished when the host controller calls mmc_request_done(). i.e.
>>>>>>> polling the card is not necessary.
>>>>>>
>>>>>> Well, that is a rather big change on its own. Earlier we polled with
>>>>>> CMD13 to verify that the card has moved back to the transfer state, in
>>>>>> case it was a write. And that was no matter of MMC_CAP_WAIT_WHILE_BUSY
>>>>>> was set or not. Right!?
>>>>>
>>>>> Yes
>>>>>
>>>>>>
>>>>>> I am not sure it's a good idea to bypass that validation, it seems
>>>>>> fragile to rely only on the busy detection on DAT line for writes.
>>>>>
>>>>> Can you cite something from the specifications that backs that up, because I
>>>>> couldn't find anything to suggest that CMD13 polling was expected.
>>>>
>>>> No I can't, but I don't see why that matters.
>>>>
>>>> My point is, if we want to go down that road by avoiding the CMD13
>>>> polling, that needs to be a separate change, which we can test and
>>>> confirm on its own.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Have you tried V9 or V10.  There was a fix in V9 related to calling
>>>>>>> ->post_req() which could mess up DMA.
>>>>>>
>>>>>> I have used V10.
>>>>>>
>>>>>>>
>>>>>>> The other thing that could go wrong with DMA is if it cannot accept
>>>>>>> ->post_req() being called from mmc_request_done().
>>>>>>
>>>>>> I don't think mmci has a problem with that, however why do you want to
>>>>>> do this? Wouldn't that defeat some of the benefits with the async
>>>>>> request mechanism?
>>>>>
>>>>> Perhaps - but it would need to be tested.  If there are more requests
>>>>> waiting, one optimization could be to defer ->post_req() until after the
>>>>> next request is started.
>>>>
>>>> This is already proven, because this how the existing mmc async
>>>> request mechanism works.
>>>>
>>>> In ->post_req() callbacks, host drivers may do dma_unmap_sg(), which
>>>> is something that could be costly and therefore it's better to start a
>>>> new request before, such these things can go on in parallel.
>>>
>>> OK I will make a patch that takes care of both issues.  That will also mean
>>> the request is not completed in the ->done() callback because ->post_req()
>>> must precede block layer completion.
>>
>> Right.
>>
>> Actually completing the request in the ->done callback, may still be
>> possible, because in principle it only needs to inform the other
>> prepared request that it may start, before it continues to post
>> process/completes the current one.
>>
>> However, by looking at for example how mmci.c works, it actually holds
>> its spinlock while it calls mmc_request_done(). The same spinlock is
>> taken in the ->request() function, but not in the ->post_req()
>> function. In other words, completing the request in the ->done()
>> callback, would make mmci to keep the spinlock held throughout the
>> post processing cycle, which then prevents the next request from being
>> started.
>>
>> So my conclusion is, let's start a as you suggested, by not completing
>> the request in ->done() as to maintain existing behavior. Then we can
>> address optimizations on top, which very likely will involve doing
>> changes to host drivers as well.
> 
> Have you tested the latest version now?
> 

Ping?