[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <u27jbp3wkgw2cyyans3rmxspqqwufymkztvyfjacrke252nbud@yfutnxhwcspr>
Date: Fri, 1 Aug 2025 18:13:52 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Neil Armstrong <neil.armstrong@...aro.org>
Cc: Ram Kumar Dwivedi <quic_rdwivedi@...cinc.com>, alim.akhtar@...sung.com,
avri.altman@....com, bvanassche@....org, robh@...nel.org, krzk+dt@...nel.org,
conor+dt@...nel.org, andersson@...nel.org, konradybcio@...nel.org, agross@...nel.org,
linux-arm-msm@...r.kernel.org, linux-scsi@...r.kernel.org, devicetree@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V1 0/3] Enable UFS MCQ support for SM8650 and SM8750
On Thu, Jul 31, 2025 at 10:50:21AM GMT, neil.armstrong@...aro.org wrote:
> Hi,
>
> On 30/07/2025 10:22, Ram Kumar Dwivedi wrote:
> > This patch series enables Multi-Circular Queue (MCQ) support for the UFS
> > host controller on Qualcomm SM8650 and SM8750 platforms. MCQ is a modern
> > queuing model that improves performance and scalability by allowing
> > multiple hardware queues.
> >
> > Although MCQ support has been present in the UFS driver for several years,
> > this is the first time it is being enabled via Device Tree for these
> > platforms.
> >
> > Patch 1 updates the device tree bindings to allow the additional register
> > regions and reg-names required for MCQ operation.
> >
> > Patches 2 and 3 update the device trees for SM8650 and SM8750 respectively
> > to enable MCQ by adding the necessary register mappings and MSI parent.
> >
> > Tested on internal hardware for both platforms.
> >
> > Palash Kambar (1):
> > arm64: dts: qcom: sm8750: Enable MCQ support for UFS controller
> >
> > Ram Kumar Dwivedi (2):
> > dt-bindings: ufs: qcom: Add MCQ support to reg and reg-names
> > arm64: dts: qcom: sm8650: Enable MCQ support for UFS controller
> >
> > .../devicetree/bindings/ufs/qcom,ufs.yaml | 21 ++++++++++++-------
> > arch/arm64/boot/dts/qcom/sm8650.dtsi | 9 +++++++-
> > arch/arm64/boot/dts/qcom/sm8750.dtsi | 10 +++++++--
> > 3 files changed, 29 insertions(+), 11 deletions(-)
> >
>
> I ran some tests on the SM8650-QRD, and it works so please add my:
> Tested-by: Neil Armstrong <neil.armstrong@...aro.org> # on SM8650-QRD
>
Thanks Neil for testing it out!
> I ran some fio tests, comparing the v6.15, v6.16 (with threaded irqs)
> and next + mcq support, and here's the analysis on the results:
>
> Significant Performance Gains in Write Operations with Multiple Jobs:
> The "mcq" change shows a substantial improvement in both IOPS and bandwidth for write operations with 8 jobs.
> Moderate Improvement in Single Job Operations (Read and Write):
> For single job operations (read and write), the "mcq" change generally leads to positive, albeit less dramatic, improvements in IOPS and bandwidth.
> Slight Decrease in Read Operations with Multiple Jobs:
> Interestingly, for read operations with 8 jobs, there's a slight decrease in both IOPS and bandwidth with the "mcq" kernel.
>
> The raw results are:
> Board: sm8650-qrd
>
> read / 1 job
> v6.15 v6.16 next+mcq
> iops (min) 3,996.00 5,921.60 4,661.20
> iops (max) 4,772.80 6,491.20 5,027.60
> iops (avg) 4,526.25 6,295.31 4,979.81
> cpu % usr 4.62 2.96 5.68
> cpu % sys 21.45 17.88 25.58
> bw (MB/s) 18.54 25.78 20.40
>
It is interesting to note the % of CPU time spent with MCQ in the 1 job case.
Looks like it is spending more time here. I'm wondering if it is the ESI
limitation/overhead.
- Mani
> read / 8 job
> v6.15 v6.16 next+mcq
> iops (min) 51,867.60 51,575.40 56,818.40
> iops (max) 67,513.60 64,456.40 65,379.60
> iops (avg) 64,314.80 62,136.76 63,016.07
> cpu % usr 3.98 3.72 3.85
> cpu % sys 16.70 17.16 14.87
> bw (MB/s) 263.60 254.40 258.20
>
> write / 1 job
> v6.15 v6.16 next+mcq
> iops (min) 5,654.80 8,060.00 7,117.20
> iops (max) 6,720.40 8,852.00 7,706.80
> iops (avg) 6,576.91 8,579.81 7,459.97
> cpu % usr 7.48 3.79 6.73
> cpu % sys 41.09 23.27 30.66
> bw (MB/s) 26.96 35.16 30.56
>
> write / 8 job
> v6.15 v6.16 next+mcq
> iops (min) 84,687.80 95,043.40 114,054.00
> iops (max) 107,620.80 113,572.00 164,526.00
> iops (avg) 97,910.86 105,927.38 149,071.43
> cpu % usr 5.43 4.38 2.88
> cpu % sys 21.73 20.29 16.09
> bw (MB/s) 400.80 433.80 610.40
>
> The test suite is:
> for rw in read write ; do
> echo "rw: ${rw}"
> for jobs in 1 8 ; do
> echo "jobs: ${jobs}"
> for it in $(seq 1 5) ; do
> fio --name=rand${rw} --rw=rand${rw} \
> --ioengine=libaio --direct=1 \
> --bs=4k --numjobs=${jobs} --size=32m \
> --runtime=30 --time_based --end_fsync=1 \
> --group_reporting --filename=/dev/disk/by-partlabel/super \
> | grep -E '(iops|sys=|READ:|WRITE:)'
> sleep 5
> done
> done
> done
>
> Thanks,
> Neil
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists