[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <842caafd-1547-1ea6-faf0-27a85a912622@ti.com>
Date: Tue, 12 Mar 2019 23:00:38 +0530
From: "Rizvi, Mohammad Faiz Abbas" <faiz_abbas@...com>
To: Adrian Hunter <adrian.hunter@...el.com>,
<linux-kernel@...r.kernel.org>, <devicetree@...r.kernel.org>,
<linux-mmc@...r.kernel.org>, <linux-omap@...r.kernel.org>
CC: <ulf.hansson@...aro.org>, <robh+dt@...nel.org>,
<mark.rutland@....com>, <kishon@...com>, <zhang.chunyan@...aro.org>
Subject: Re: [PATCH v2 1/8] mmc: sdhci: Get rid of finish_tasklet
Hi Adrian,
On 3/8/2019 7:06 PM, Adrian Hunter wrote:
> On 6/03/19 12:00 PM, Faiz Abbas wrote:
>> Adrian,
>>
>> On 25/02/19 1:47 PM, Adrian Hunter wrote:
>>> On 15/02/19 9:20 PM, Faiz Abbas wrote:
>>>> sdhci.c has two bottom halves implemented. A threaded_irq for handling
>>>> card insert/remove operations and a tasklet for finishing mmc requests.
>>>> With the addition of external dma support, dmaengine APIs need to
>>>> terminate in non-atomic context before unmapping the dma buffers.
>>>>
>>>> To facilitate this, remove the finish_tasklet and move the call of
>>>> sdhci_request_done() to the threaded_irq() callback.
>>>
>>> The irq thread has a higher latency than the tasklet. The performance drop
>>> is measurable on the system I tried:
>>>
>>> Before:
>>>
>>> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 &
>>> 1+0 records in
>>> 1+0 records out
>>> 1073741824 bytes (1.1 GB) copied, 4.44502 s, 242 MB/s
>>>
>>> After:
>>>
>>> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 &
>>> 1+0 records in
>>> 1+0 records out
>>> 1073741824 bytes (1.1 GB) copied, 4.50898 s, 238 MB/s
>>>
>>> So we only want to resort to the thread for the error case.
>>>
>>
>> Sorry for the late response here, but this is about 1.6% decrease. I
>> tried out the same commands on a dra7xx board here (with about 5
>> consecutive dd of 1GB) and the average decrease was 0.3%. I believe you
>> will also find a lesser percentage change if you average over multiple
>> dd commands.
>>
>> Is this really so significant that we have to maintain two different
>> bottom halves and keep having difficulty with adding APIs that can sleep?
>
> It is a performance drop that can be avoided, so it might as well be.
> Splitting the success path from the failure path is common for I/O drivers
> for similar reasons as here: the success path can be optimized whereas the
> failure path potentially needs to sleep.
Understood. You wanna keep the success path as fast as possible.
>
>>
>> Also I am not sure how to implement only the error handling part in the
>> threaded_irq. We need to enter sdhci_request_done() and get the current
>> mrq before we can check for error conditions like I've done in patch 2:
>>
>> /* Terminate and synchronize dma in case of an error */
>> if (data && (mrq->cmd->error || data->error) &&
>> host->use_external_dma) {
>> struct dma_chan *chan = sdhci_external_dma_channel(host, data);
>> dmaengine_terminate_sync(chan);
>> }
>>
>> On a related note, do we really need to protect everything in
>> sdhci_request_done() with spinlocks?
>> In patch 2 I have only removed lock
>> for the terminate_sync() parts that I added but the whole
>> dma_unmap/dma_sync parts should be left unprotected IMO.
>
> As it is written, synchronization is needed to stop the same mrq being
> finished twice.
>
> I suggest doing the dmaengine_terminate_sync() before calling
> sdhci_request_done(). Perhaps you could record the channel that needs to be
> sync'd and then do:
>
> struct dma_chan *chan = READ_ONCE(host->sync_chan);
>
> if (chan) {
> dmaengine_terminate_sync(chan);
> host->sync_chan = NULL;
> }
>
Will try this out and send next version.
Thanks,
Faiz
Powered by blists - more mailing lists