lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <u27jbp3wkgw2cyyans3rmxspqqwufymkztvyfjacrke252nbud@yfutnxhwcspr>
Date: Fri, 1 Aug 2025 18:13:52 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Neil Armstrong <neil.armstrong@...aro.org>
Cc: Ram Kumar Dwivedi <quic_rdwivedi@...cinc.com>, alim.akhtar@...sung.com, 
	avri.altman@....com, bvanassche@....org, robh@...nel.org, krzk+dt@...nel.org, 
	conor+dt@...nel.org, andersson@...nel.org, konradybcio@...nel.org, agross@...nel.org, 
	linux-arm-msm@...r.kernel.org, linux-scsi@...r.kernel.org, devicetree@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH V1 0/3] Enable UFS MCQ support for SM8650 and SM8750

On Thu, Jul 31, 2025 at 10:50:21AM GMT, neil.armstrong@...aro.org wrote:
> Hi,
> 
> On 30/07/2025 10:22, Ram Kumar Dwivedi wrote:
> > This patch series enables Multi-Circular Queue (MCQ) support for the UFS
> > host controller on Qualcomm SM8650 and SM8750 platforms. MCQ is a modern
> > queuing model that improves performance and scalability by allowing
> > multiple hardware queues.
> > 
> > Although MCQ support has been present in the UFS driver for several years,
> > this is the first time it is being enabled via Device Tree for these
> > platforms.
> > 
> > Patch 1 updates the device tree bindings to allow the additional register
> > regions and reg-names required for MCQ operation.
> > 
> > Patches 2 and 3 update the device trees for SM8650 and SM8750 respectively
> > to enable MCQ by adding the necessary register mappings and MSI parent.
> > 
> > Tested on internal hardware for both platforms.
> > 
> > Palash Kambar (1):
> >    arm64: dts: qcom: sm8750: Enable MCQ support for UFS controller
> > 
> > Ram Kumar Dwivedi (2):
> >    dt-bindings: ufs: qcom: Add MCQ support to reg and reg-names
> >    arm64: dts: qcom: sm8650: Enable MCQ support for UFS controller
> > 
> >   .../devicetree/bindings/ufs/qcom,ufs.yaml     | 21 ++++++++++++-------
> >   arch/arm64/boot/dts/qcom/sm8650.dtsi          |  9 +++++++-
> >   arch/arm64/boot/dts/qcom/sm8750.dtsi          | 10 +++++++--
> >   3 files changed, 29 insertions(+), 11 deletions(-)
> > 
> 
> I ran some tests on the SM8650-QRD, and it works so please add my:
> Tested-by: Neil Armstrong <neil.armstrong@...aro.org> # on SM8650-QRD
> 

Thanks Neil for testing it out!

> I ran some fio tests, comparing the v6.15, v6.16 (with threaded irqs)
> and next + mcq support, and here's the analysis on the results:
> 
> Significant Performance Gains in Write Operations with Multiple Jobs:
> The "mcq" change shows a substantial improvement in both IOPS and bandwidth for write operations with 8 jobs.
> Moderate Improvement in Single Job Operations (Read and Write):
> For single job operations (read and write), the "mcq" change generally leads to positive, albeit less dramatic, improvements in IOPS and bandwidth.
> Slight Decrease in Read Operations with Multiple Jobs:
> Interestingly, for read operations with 8 jobs, there's a slight decrease in both IOPS and bandwidth with the "mcq" kernel.
> 
> The raw results are:
> Board: sm8650-qrd
> 
> read / 1 job
>                v6.15     v6.16  next+mcq
> iops (min)  3,996.00  5,921.60  4,661.20
> iops (max)  4,772.80  6,491.20  5,027.60
> iops (avg)  4,526.25  6,295.31  4,979.81
> cpu % usr       4.62      2.96      5.68
> cpu % sys      21.45     17.88     25.58
> bw (MB/s)      18.54     25.78     20.40
> 

It is interesting to note the % of CPU time spent with MCQ in the 1 job case.
Looks like it is spending more time here. I'm wondering if it is the ESI
limitation/overhead.

- Mani

> read / 8 job
>                 v6.15      v6.16   next+mcq
> iops (min)  51,867.60  51,575.40  56,818.40
> iops (max)  67,513.60  64,456.40  65,379.60
> iops (avg)  64,314.80  62,136.76  63,016.07
> cpu % usr        3.98       3.72       3.85
> cpu % sys       16.70      17.16      14.87
> bw (MB/s)      263.60     254.40     258.20
> 
> write / 1 job
>                v6.15     v6.16  next+mcq
> iops (min)  5,654.80  8,060.00  7,117.20
> iops (max)  6,720.40  8,852.00  7,706.80
> iops (avg)  6,576.91  8,579.81  7,459.97
> cpu % usr       7.48      3.79      6.73
> cpu % sys      41.09     23.27     30.66
> bw (MB/s)      26.96     35.16     30.56
> 
> write / 8 job
>                  v6.15       v6.16    next+mcq
> iops (min)   84,687.80   95,043.40  114,054.00
> iops (max)  107,620.80  113,572.00  164,526.00
> iops (avg)   97,910.86  105,927.38  149,071.43
> cpu % usr         5.43        4.38        2.88
> cpu % sys        21.73       20.29       16.09
> bw (MB/s)       400.80      433.80      610.40
> 
> The test suite is:
> for rw in read write ; do
>     echo "rw: ${rw}"
>     for jobs in 1 8 ; do
>         echo "jobs: ${jobs}"
>         for it in $(seq 1 5) ; do
>             fio --name=rand${rw} --rw=rand${rw} \
>                 --ioengine=libaio --direct=1 \
>                 --bs=4k --numjobs=${jobs} --size=32m \
>                 --runtime=30 --time_based --end_fsync=1 \
>                 --group_reporting --filename=/dev/disk/by-partlabel/super \
>             | grep -E '(iops|sys=|READ:|WRITE:)'
>             sleep 5
>         done
>     done
> done
> 
> Thanks,
> Neil

-- 
மணிவண்ணன் சதாசிவம்

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ