lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aff38b98-23ff-4dcd-afab-2a0d8c8ad599@linaro.org>
Date: Thu, 31 Jul 2025 10:50:21 +0200
From: neil.armstrong@...aro.org
To: Ram Kumar Dwivedi <quic_rdwivedi@...cinc.com>, mani@...nel.org,
 alim.akhtar@...sung.com, avri.altman@....com, bvanassche@....org,
 robh@...nel.org, krzk+dt@...nel.org, conor+dt@...nel.org,
 andersson@...nel.org, konradybcio@...nel.org, agross@...nel.org
Cc: linux-arm-msm@...r.kernel.org, linux-scsi@...r.kernel.org,
 devicetree@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V1 0/3] Enable UFS MCQ support for SM8650 and SM8750

Hi,

On 30/07/2025 10:22, Ram Kumar Dwivedi wrote:
> This patch series enables Multi-Circular Queue (MCQ) support for the UFS
> host controller on Qualcomm SM8650 and SM8750 platforms. MCQ is a modern
> queuing model that improves performance and scalability by allowing
> multiple hardware queues.
> 
> Although MCQ support has been present in the UFS driver for several years,
> this is the first time it is being enabled via Device Tree for these
> platforms.
> 
> Patch 1 updates the device tree bindings to allow the additional register
> regions and reg-names required for MCQ operation.
> 
> Patches 2 and 3 update the device trees for SM8650 and SM8750 respectively
> to enable MCQ by adding the necessary register mappings and MSI parent.
> 
> Tested on internal hardware for both platforms.
> 
> Palash Kambar (1):
>    arm64: dts: qcom: sm8750: Enable MCQ support for UFS controller
> 
> Ram Kumar Dwivedi (2):
>    dt-bindings: ufs: qcom: Add MCQ support to reg and reg-names
>    arm64: dts: qcom: sm8650: Enable MCQ support for UFS controller
> 
>   .../devicetree/bindings/ufs/qcom,ufs.yaml     | 21 ++++++++++++-------
>   arch/arm64/boot/dts/qcom/sm8650.dtsi          |  9 +++++++-
>   arch/arm64/boot/dts/qcom/sm8750.dtsi          | 10 +++++++--
>   3 files changed, 29 insertions(+), 11 deletions(-)
> 

I ran some tests on the SM8650-QRD, and it works so please add my:
Tested-by: Neil Armstrong <neil.armstrong@...aro.org> # on SM8650-QRD

I ran some fio tests, comparing the v6.15, v6.16 (with threaded irqs)
and next + mcq support, and here's the analysis on the results:

Significant Performance Gains in Write Operations with Multiple Jobs:
The "mcq" change shows a substantial improvement in both IOPS and bandwidth for write operations with 8 jobs.
Moderate Improvement in Single Job Operations (Read and Write):
For single job operations (read and write), the "mcq" change generally leads to positive, albeit less dramatic, improvements in IOPS and bandwidth.
Slight Decrease in Read Operations with Multiple Jobs:
Interestingly, for read operations with 8 jobs, there's a slight decrease in both IOPS and bandwidth with the "mcq" kernel.

The raw results are:
Board: sm8650-qrd

read / 1 job
                v6.15     v6.16  next+mcq
iops (min)  3,996.00  5,921.60  4,661.20
iops (max)  4,772.80  6,491.20  5,027.60
iops (avg)  4,526.25  6,295.31  4,979.81
cpu % usr       4.62      2.96      5.68
cpu % sys      21.45     17.88     25.58
bw (MB/s)      18.54     25.78     20.40

read / 8 job
                 v6.15      v6.16   next+mcq
iops (min)  51,867.60  51,575.40  56,818.40
iops (max)  67,513.60  64,456.40  65,379.60
iops (avg)  64,314.80  62,136.76  63,016.07
cpu % usr        3.98       3.72       3.85
cpu % sys       16.70      17.16      14.87
bw (MB/s)      263.60     254.40     258.20

write / 1 job
                v6.15     v6.16  next+mcq
iops (min)  5,654.80  8,060.00  7,117.20
iops (max)  6,720.40  8,852.00  7,706.80
iops (avg)  6,576.91  8,579.81  7,459.97
cpu % usr       7.48      3.79      6.73
cpu % sys      41.09     23.27     30.66
bw (MB/s)      26.96     35.16     30.56

write / 8 job
                  v6.15       v6.16    next+mcq
iops (min)   84,687.80   95,043.40  114,054.00
iops (max)  107,620.80  113,572.00  164,526.00
iops (avg)   97,910.86  105,927.38  149,071.43
cpu % usr         5.43        4.38        2.88
cpu % sys        21.73       20.29       16.09
bw (MB/s)       400.80      433.80      610.40

The test suite is:
for rw in read write ; do
     echo "rw: ${rw}"
     for jobs in 1 8 ; do
         echo "jobs: ${jobs}"
         for it in $(seq 1 5) ; do
             fio --name=rand${rw} --rw=rand${rw} \
                 --ioengine=libaio --direct=1 \
                 --bs=4k --numjobs=${jobs} --size=32m \
                 --runtime=30 --time_based --end_fsync=1 \
                 --group_reporting --filename=/dev/disk/by-partlabel/super \
             | grep -E '(iops|sys=|READ:|WRITE:)'
             sleep 5
         done
     done
done

Thanks,
Neil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ