[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a2c5b752968496ca72966f80e148d47@hyperstone.com>
Date: Fri, 10 Mar 2023 17:06:32 +0000
From: Christian Löhle <CLoehle@...erstone.com>
To: Ulf Hansson <ulf.hansson@...aro.org>
CC: "linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>,
Wenchao Chen <wenchao.chen666@...il.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Avri Altman <avri.altman@....com>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC PATCH] mmc: core: Disable REQ_FUA if the eMMC supports an
internal cache
>>
>> I have benchmarked the FUA/Cache behavior a bit.
>> I don't have an actual filesystem benchmark that does what I wanted and is easy to port to the target so I used:
>>
>> # call with
>> # for loop in {1..3}; do sudo dd if=/dev/urandom bs=1M
>> of=/dev/mmcblk2; done; for loop in {1..5}; do time
>> ./filesystembenchmark.sh; umount /mnt; done
>> mkfs.ext4 -F /dev/mmcblk2
>> mount /dev/mmcblk2 /mnt
>> for i in {1..3}
>> do
>> cp -r linux-6.2.2 /mnt/$i
>> done
>> for i in {1..3}
>> do
>> rm -r /mnt/$i
>> done
>> for i in {1..3}
>> do
>> cp -r linux-6.2.2 /mnt/$i
>> done
>>
>>
>> I found a couple of DUTs that I can link, I also tested one industrial card.
>>
>> DUT1: blue PCB Foresee eMMC
>> https://pine64.com/product/32gb-emmc-module/
>> DUT2: green PCB SiliconGo eMMC
>> Couldn't find that one online anymore unfortunately
>> DUT3: orange hardkernel PCB 8GB
>> https://www.hardkernel.com/shop/8gb-emmc-module-c2-android/
>> DUT4: orange hardkernel PCB white dot
>> https://rlx.sk/en/odroid/3198-16gb-emmc-50-module-xu3-android-for-odro
>> id-xu3.html
>> DUT5: Industrial card
>
> Thanks a lot for helping out with testing! Much appreciated!
No problem, glad to be of help.
>
>>
>>
>> The test issued 461 DO_REL_WR during one of the iterations for DUT5
>>
>> DUT1:
>> Cache, no FUA:
>> 13:04.49
>> 13:13.82
>> 13:30.59
>> 13:28:13
>> 13:20:64
>> FUA:
>> 13:30.32
>> 13:36.26
>> 13:10.86
>> 13:32.52
>> 13:48.59
>>
>> DUT2:
>> FUA:
>> 8:11.24
>> 7:47.73
>> 7:48.00
>> 7:48.18
>> 7:47.38
>> Cache, no FUA:
>> 8:10.30
>> 7:48.97
>> 7:48.47
>> 7:47.93
>> 7:44.18
>>
>> DUT3:
>> Cache, no FUA:
>> 7:02.82
>> 6:58.94
>> 7:03.20
>> 7:00.27
>> 7:00.88
>> FUA:
>> 7:05.43
>> 7:03.44
>> 7:04.82
>> 7:03.26
>> 7:04.74
>>
>> DUT4:
>> FUA:
>> 7:23.92
>> 7:20.15
>> 7:20.52
>> 7:19.10
>> 7:20.71
>> Cache, no FUA:
>> 7:20.23
>> 7:20.48
>> 7:19.94
>> 7:18.90
>> 7:19.88
>
> Without going into the details of the above, it seems like for DUT1, DUT2, DUT3 and DUT4 there a good reasons to why we should move forward with $subject patch.
>
> Do you agree?
That is a good question, that's why I just posted the data without further comment from my side.
I was honestly expecting the difference to be much higher, given the original patch.
If this is representative for most cards, you would require quite an unusual workload to actually notice the difference IMO.
If there are cards where the difference is much more significant then of course a quirk would be nicer.
On the other side I don't see why not and any improvement is a good one?
>
>>
>> Cache, no FUA:
>> 7:19.36
>> 7:02.11
>> 7:01.53
>> 7:01.35
>> 7:00.37
>> Cache, no FUA CQE:
>> 7:17.55
>> 7:00.73
>> 6:59.25
>> 6:58.44
>> 6:58.60
>> FUA:
>> 7:15.10
>> 6:58.99
>> 6:58.94
>> 6:59.17
>> 6:60.00
>> FUA CQE:
>> 7:11.03
>> 6:58.04
>> 6:56.89
>> 6:56.43
>> 6:56:28
>>
>> If anyone has any comments or disagrees with the benchmark, or has a specific eMMC to test, let me know.
>
> If I understand correctly, for DUT5, it seems like using FUA may be slightly better than just cache-flushing, right?
That is correct, I specifically tested with this card as under the assumption that reliable write is without much additional cost, the DCMD would be slightly worse for performance and SYNC a bit worse.
>
> For CQE, it seems like FUA could be slightly even better, at least for DUT5. Do you know if REQ_OP_FLUSH translates into MMC_ISSUE_DCMD or MMC_ISSUE_SYNC for your case? See mmc_cqe_issue_type().
It is SYNC (this is sdhci-of-arasan on rk3399, no DCMD), but even SYNC is not too bad here it seems, could of course be worse if the workload was less sequential.
>
> When it comes to CQE, maybe Adrian have some additional thoughts around this? Perhaps we should keep using REQ_FUA, if we have CQE?
Sure, I'm also interested in Adrian's take on this.
Regards,
Christian
Hyperstone GmbH | Reichenaustr. 39a | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782
Powered by blists - more mailing lists