[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e06161bf49a3a88c4ea2e7a406815be56114c4f.camel@linaro.org>
Date: Mon, 21 Jul 2025 13:04:53 +0100
From: André Draszik <andre.draszik@...aro.org>
To: Neil Armstrong <neil.armstrong@...aro.org>, Alim Akhtar
<alim.akhtar@...sung.com>, Avri Altman <avri.altman@....com>, Bart Van
Assche <bvanassche@....org>, "James E.J. Bottomley"
<James.Bottomley@...senPartnership.com>, "Martin K. Petersen"
<martin.petersen@...cle.com>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
linux-arm-msm@...r.kernel.org, linux-scsi@...r.kernel.org,
linux-kernel@...r.kernel.org, Peter Griffin <peter.griffin@...aro.org>,
Will McVicker <willmcvicker@...gle.com>, kernel-team@...roid.com, Tudor
Ambarus <tudor.ambarus@...aro.org>
Subject: Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service
routine to a threaded irq handler
Hi,
On Mon, 2025-04-07 at 12:17 +0200, Neil Armstrong wrote:
> On systems with a large number request slots and unavailable MCQ ESI,
> the current design of the interrupt handler can delay handling of
> other subsystems interrupts causing display artifacts, GPU stalls
> or system firmware requests timeouts.
>
> Since the interrupt routine can take quite some time, it's
> preferable to move it to a threaded handler and leave the
> hard interrupt handler wake up the threaded interrupt routine,
> the interrupt line would be masked until the processing is
> finished in the thread thanks to the IRQS_ONESHOT flag.
>
> When MCQ & ESI interrupts are enabled the I/O completions are now
> directly handled in the "hard" interrupt routine to keep IOPs high
> since queues handling is done in separate per-queue interrupt routines.
This patch adversely affects Pixel 6 UFS performance. It has a
UFSHCI v3.x controller I believe (and therefore probably all
devices with < v4) - if my limited understanding is correct,
MCQ & ESI are a feature of v4 controllers only.
On Pixel 6, fio reports following performance on linux-next with
this patch:
read [1] / write [2]:
READ: bw=17.1MiB/s (17.9MB/s), 17.1MiB/s-17.1MiB/s (17.9MB/s-17.9MB/s), io=684MiB (718MB), run=40001-40001msec
WRITE: bw=20.6MiB/s (21.5MB/s), 20.6MiB/s-20.6MiB/s (21.5MB/s-21.5MB/s), io=822MiB (862MB), run=40003-40003msec
With this patch reverted, performance changes back to:
read [1] / write [2]:
READ: bw=19.9MiB/s (20.8MB/s), 19.9MiB/s-19.9MiB/s (20.8MB/s-20.8MB/s), io=795MiB (833MB), run=40001-40001msec
WRITE: bw=28.0MiB/s (29.4MB/s), 28.0MiB/s-28.0MiB/s (29.4MB/s-29.4MB/s), io=1122MiB (1176MB), run=40003-40003msec
all over multiple runs.
which is a ~26% reduction for write and ~14% reduction for read.
PCBenchmark even reports performance drops of ~41%.
I don't know much about UFS at this stage, but could the code simply
check for the controller version and revert to original behaviour
if < v4? Any thoughts on such a change?
[1]: fio --name=randread --rw=randread --ioengine=libaio --direct=1 \
--bs=4k --numjobs=1 --size=1g --ramp_time=10 --runtime=40 --time_based \
--end_fsync=1 --group_reporting --filename=/foo
[2]: fio --name=randwrite --rw=randwrite --ioengine=libaio --direct=1 \
--bs=4k --numjobs=1 --size=1g --ramp_time=10 --runtime=40 --time_based \
--end_fsync=1 --group_reporting --filename=/foo
Cheers,
Andre'
Powered by blists - more mailing lists