lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e67269f-88a4-466a-ac34-430b82fac4ff@intel.com>
Date: Thu, 21 Mar 2024 08:40:05 +0200
From: Adrian Hunter <adrian.hunter@...el.com>
To: Maxim Kiselev <bigunclemax@...il.com>, serghox@...il.com
Cc: jyanchou@...ltek.com, open list <linux-kernel@...r.kernel.org>,
 linux-mmc@...r.kernel.org, quic_asutoshd@...cinc.com, ritesh.list@...il.com,
 shawn.lin@...k-chips.com, Ulf Hansson <ulf.hansson@...aro.org>
Subject: Re: [PATCH v7 0/2] mmc: sdhci-of-dwcmshc: Add CQE support

On 20/03/24 12:36, Maxim Kiselev wrote:
> Subject: [PATCH v7 0/2] mmc: sdhci-of-dwcmshc: Add CQE support
> 
> Hi Sergey, Adrian!
> 
> First of all I want to thank Sergey for supporting the CQE feature
> on the DWC MSHC controller.
> 
> I tested this series on the LicheePi 4A board (TH1520 SoC).
> It has the DWC MSHC IP too and according to the T-Head datasheet
> it also supports the CQE feature.
> 
>> Supports Command Queuing Engine (CQE) and compliant with eMMC CQ HCI.
> 
> So, to enable CQE on LicheePi 4A need to set a prop in DT
> and add a IRQ handler to th1520_ops:
>> .irq = dwcmshc_cqe_irq_handler,
> 
> And the CQE will work for th1520 SoC too.
> 
> But, when I enabled the CQE, I was faced with a strange effect.
> 
> The fio benchmark shows that emmc works ~2.5 slower with enabled CQE.
> 219MB/s w/o CQE vs 87.4MB/s w/ CQE. I'll put logs below.
> 
> I would be very appreciative if you could point me where to look for
> the bottleneck.

Some things you could try:

 Check for any related kernel messages.

 Have a look at /sys/kernel/debug/mmc*/err_stats

 See if disabling runtime PM for the host controller has any effect.

 Enable mmc dynamic debug messages and see if anything looks different.

> 
> Without CQE:
> 
> # cat /sys/kernel/debug/mmc0/ios
> clock:          198000000 Hz
> actual clock:   198000000 Hz
> vdd:            21 (3.3 ~ 3.4 V)
> bus mode:       2 (push-pull)
> chip select:    0 (don't care)
> power mode:     2 (on)
> bus width:      3 (8 bits)
> timing spec:    10 (mmc HS400 enhanced strobe)
> signal voltage: 1 (1.80 V)
> driver type:    0 (driver type B)
> 
> # fio --filename=/dev/mmcblk0 --direct=1 --rw=randread --bs=1M
> --ioengine=sync --iodepth=256 --size=4G --numjobs=1 --group_reporting
> --name=iops-test-job --eta-newline=1 --readonly
> iops-test-job: (g=0): rw=randread, bs=(R) 1024KiB-1024KiB, (W)
> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=sync, iodepth=256
> fio-3.34
> Starting 1 process
> note: both iodepth >= 1 and synchronous I/O engine are selected, queue
> depth will be capped at 1
> Jobs: 1 (f=1): [r(1)][15.0%][r=209MiB/s][r=209 IOPS][eta 00m:17s]
> Jobs: 1 (f=1): [r(1)][25.0%][r=208MiB/s][r=208 IOPS][eta 00m:15s]
> Jobs: 1 (f=1): [r(1)][35.0%][r=207MiB/s][r=207 IOPS][eta 00m:13s]
> Jobs: 1 (f=1): [r(1)][47.4%][r=208MiB/s][r=208 IOPS][eta 00m:10s]
> Jobs: 1 (f=1): [r(1)][52.6%][r=209MiB/s][r=208 IOPS][eta 00m:09s]
> Jobs: 1 (f=1): [r(1)][63.2%][r=208MiB/s][r=208 IOPS][eta 00m:07s]
> Jobs: 1 (f=1): [r(1)][68.4%][r=208MiB/s][r=207 IOPS][eta 00m:06s]
> Jobs: 1 (f=1): [r(1)][78.9%][r=207MiB/s][r=207 IOPS][eta 00m:04s]
> Jobs: 1 (f=1): [r(1)][89.5%][r=209MiB/s][r=209 IOPS][eta 00m:02s]
> Jobs: 1 (f=1): [r(1)][100.0%][r=209MiB/s][r=209 IOPS][eta 00m:00s]
> iops-test-job: (groupid=0, jobs=1): err= 0: pid=132: Thu Jan  1 00:03:44 1970
>   read: IOPS=208, BW=208MiB/s (219MB/s)(4096MiB/19652msec)
>     clat (usec): min=3882, max=11557, avg=4778.37, stdev=238.26
>      lat (usec): min=3883, max=11559, avg=4779.93, stdev=238.26
>     clat percentiles (usec):
>      |  1.00th=[ 4359],  5.00th=[ 4555], 10.00th=[ 4555], 20.00th=[ 4621],
>      | 30.00th=[ 4621], 40.00th=[ 4686], 50.00th=[ 4752], 60.00th=[ 4817],
>      | 70.00th=[ 4883], 80.00th=[ 4948], 90.00th=[ 5014], 95.00th=[ 5145],
>      | 99.00th=[ 5473], 99.50th=[ 5538], 99.90th=[ 5932], 99.95th=[ 6915],
>      | 99.99th=[11600]
>    bw (  KiB/s): min=208896, max=219136, per=100.00%, avg=213630.77,
> stdev=1577.33, samples=39
>    iops        : min=  204, max=  214, avg=208.56, stdev= 1.55, samples=39
>   lat (msec)   : 4=0.39%, 10=99.58%, 20=0.02%
>   cpu          : usr=0.38%, sys=13.04%, ctx=4132, majf=0, minf=275
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued rwts: total=4096,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=256
> 
> Run status group 0 (all jobs):
>    READ: bw=208MiB/s (219MB/s), 208MiB/s-208MiB/s (219MB/s-219MB/s),
> io=4096MiB (4295MB), run=19652-19652msec
> 
> Disk stats (read/write):
>   mmcblk0: ios=8181/0, merge=0/0, ticks=25682/0, in_queue=25682, util=99.66%
> 
> 
> With CQE:

Was output from "cat /sys/kernel/debug/mmc0/ios" the same?

> 
> fio --filename=/dev/mmcblk1 --direct=1 --rw=randread --bs=1M --ioengine=sync -
> -iodepth=256 --size=4G --numjobs=1 --group_reporting --name=iops-test-job --eta-
> newline=1 --readonly
> iops-test-job: (g=0): rw=randread, bs=(R) 1024KiB-1024KiB, (W)
> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioeng
> ine=sync, iodepth=256
> fio-3.34
> Starting 1 process
> note: both iodepth >= 1 and synchronous I/O engine are selected, queue
> depth will be capped at 1
> Jobs: 1 (f=1): [r(1)][5.8%][r=83.1MiB/s][r=83 IOPS][eta 00m:49s]
> Jobs: 1 (f=1): [r(1)][10.0%][r=84.0MiB/s][r=84 IOPS][eta 00m:45s]
> Jobs: 1 (f=1): [r(1)][14.0%][r=83.1MiB/s][r=83 IOPS][eta 00m:43s]
> Jobs: 1 (f=1): [r(1)][18.0%][r=83.1MiB/s][r=83 IOPS][eta 00m:41s]
> Jobs: 1 (f=1): [r(1)][22.4%][r=84.1MiB/s][r=84 IOPS][eta 00m:38s]
> Jobs: 1 (f=1): [r(1)][26.5%][r=83.1MiB/s][r=83 IOPS][eta 00m:36s]
> Jobs: 1 (f=1): [r(1)][30.6%][r=83.1MiB/s][r=83 IOPS][eta 00m:34s]
> Jobs: 1 (f=1): [r(1)][34.7%][r=84.1MiB/s][r=84 IOPS][eta 00m:32s]
> Jobs: 1 (f=1): [r(1)][38.8%][r=83.1MiB/s][r=83 IOPS][eta 00m:30s]
> Jobs: 1 (f=1): [r(1)][42.9%][r=83.1MiB/s][r=83 IOPS][eta 00m:28s]
> Jobs: 1 (f=1): [r(1)][46.9%][r=84.1MiB/s][r=84 IOPS][eta 00m:26s]
> Jobs: 1 (f=1): [r(1)][51.0%][r=83.0MiB/s][r=83 IOPS][eta 00m:24s]
> Jobs: 1 (f=1): [r(1)][55.1%][r=83.0MiB/s][r=83 IOPS][eta 00m:22s]
> Jobs: 1 (f=1): [r(1)][59.2%][r=84.1MiB/s][r=84 IOPS][eta 00m:20s]
> Jobs: 1 (f=1): [r(1)][63.3%][r=83.0MiB/s][r=83 IOPS][eta 00m:18s]
> Jobs: 1 (f=1): [r(1)][67.3%][r=83.1MiB/s][r=83 IOPS][eta 00m:16s]
> Jobs: 1 (f=1): [r(1)][71.4%][r=84.1MiB/s][r=84 IOPS][eta 00m:14s]
> Jobs: 1 (f=1): [r(1)][75.5%][r=83.0MiB/s][r=83 IOPS][eta 00m:12s]
> Jobs: 1 (f=1): [r(1)][79.6%][r=83.0MiB/s][r=83 IOPS][eta 00m:10s]
> Jobs: 1 (f=1): [r(1)][83.7%][r=84.0MiB/s][r=84 IOPS][eta 00m:08s]
> Jobs: 1 (f=1): [r(1)][87.8%][r=83.1MiB/s][r=83 IOPS][eta 00m:06s]
> Jobs: 1 (f=1): [r(1)][91.8%][r=83.0MiB/s][r=83 IOPS][eta 00m:04s]
> Jobs: 1 (f=1): [r(1)][95.9%][r=84.0MiB/s][r=84 IOPS][eta 00m:02s]
> Jobs: 1 (f=1): [r(1)][100.0%][r=83.0MiB/s][r=83 IOPS][eta 00m:00s]
> iops-test-job: (groupid=0, jobs=1): err= 0: pid=134: Thu Jan  1 00:02:19 1970
>   read: IOPS=83, BW=83.3MiB/s (87.4MB/s)(4096MiB/49154msec)
>     clat (usec): min=11885, max=14840, avg=11981.37, stdev=61.89
>      lat (usec): min=11887, max=14843, avg=11983.00, stdev=61.92
>     clat percentiles (usec):
>      |  1.00th=[11863],  5.00th=[11994], 10.00th=[11994], 20.00th=[11994],
>      | 30.00th=[11994], 40.00th=[11994], 50.00th=[11994], 60.00th=[11994],
>      | 70.00th=[11994], 80.00th=[11994], 90.00th=[11994], 95.00th=[11994],
>      | 99.00th=[12125], 99.50th=[12256], 99.90th=[12387], 99.95th=[12387],
>      | 99.99th=[14877]
>    bw (  KiB/s): min=83800, max=86016, per=100.00%, avg=85430.61,
> stdev=894.16, samples=98
>    iops        : min=   81, max=   84, avg=83.22, stdev= 0.89, samples=98
>   lat (msec)   : 20=100.00%
>   cpu          : usr=0.00%, sys=5.44%, ctx=4097, majf=0, minf=274
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued rwts: total=4096,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=256
> 
> Run status group 0 (all jobs):
>    READ: bw=83.3MiB/s (87.4MB/s), 83.3MiB/s-83.3MiB/s
> (87.4MB/s-87.4MB/s), io=4096MiB (4295MB), run=49154-
> 49154msec
> 
> Disk stats (read/write):
>   mmcblk1: ios=8181/0, merge=0/0, ticks=69682/0, in_queue=69682, util=99.96%
> 
> 
> Best regards,
> Maksim


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ