linux-kernel - Re: [PATCH v2 1/8] mmc: sdhci: Get rid of finish

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8d72ff93-e07f-52b9-da85-acd54f046694@ti.com>
Date:   Wed, 6 Mar 2019 15:30:06 +0530
From:   Faiz Abbas <faiz_abbas@...com>
To:     Adrian Hunter <adrian.hunter@...el.com>,
        <linux-kernel@...r.kernel.org>, <devicetree@...r.kernel.org>,
        <linux-mmc@...r.kernel.org>, <linux-omap@...r.kernel.org>
CC:     <ulf.hansson@...aro.org>, <robh+dt@...nel.org>,
        <mark.rutland@....com>, <kishon@...com>, <zhang.chunyan@...aro.org>
Subject: Re: [PATCH v2 1/8] mmc: sdhci: Get rid of finish_tasklet

Adrian,

On 25/02/19 1:47 PM, Adrian Hunter wrote:
> On 15/02/19 9:20 PM, Faiz Abbas wrote:
>> sdhci.c has two bottom halves implemented. A threaded_irq for handling
>> card insert/remove operations and a tasklet for finishing mmc requests.
>> With the addition of external dma support, dmaengine APIs need to
>> terminate in non-atomic context before unmapping the dma buffers.
>>
>> To facilitate this, remove the finish_tasklet and move the call of
>> sdhci_request_done() to the threaded_irq() callback.
> 
> The irq thread has a higher latency than the tasklet.  The performance drop
> is measurable on the system I tried:
> 
> Before:
> 
> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 &
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 4.44502 s, 242 MB/s
> 
> After:
> 
> # dd if=/dev/mmcblk1 of=/dev/null bs=1G count=1 &
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 4.50898 s, 238 MB/s
> 
> So we only want to resort to the thread for the error case.
> 

Sorry for the late response here, but this is about 1.6% decrease. I
tried out the same commands on a dra7xx board here (with about 5
consecutive dd of 1GB) and the average decrease was 0.3%. I believe you
will also find a lesser percentage change if you average over multiple
dd commands.

Is this really so significant that we have to maintain two different
bottom halves and keep having difficulty with adding APIs that can sleep?

Also I am not sure how to implement only the error handling part in the
threaded_irq. We need to enter sdhci_request_done() and get the current
mrq before we can check for error conditions like I've done in patch 2:

/* Terminate and synchronize dma in case of an error */
if (data && (mrq->cmd->error || data->error) &&
    host->use_external_dma) {
	struct dma_chan *chan = sdhci_external_dma_channel(host, data);
	dmaengine_terminate_sync(chan);
}

On a related note, do we really need to protect everything in
sdhci_request_done() with spinlocks? In patch 2 I have only removed lock
for the terminate_sync() parts that I added but the whole
dma_unmap/dma_sync parts should be left unprotected IMO.

Thanks,
Faiz