lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4935395591bc8baef39d2acc491c6c40889090d9.camel@linaro.org>
Date: Mon, 21 Jul 2025 13:10:43 +0100
From: André Draszik <andre.draszik@...aro.org>
To: Neil Armstrong <neil.armstrong@...aro.org>, Alim Akhtar	
 <alim.akhtar@...sung.com>, Avri Altman <avri.altman@....com>, Bart Van
 Assche	 <bvanassche@....org>, "James E.J. Bottomley"	
 <James.Bottomley@...senPartnership.com>, "Martin K. Petersen"	
 <martin.petersen@...cle.com>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, 
	linux-arm-msm@...r.kernel.org, linux-scsi@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Peter Griffin <peter.griffin@...aro.org>,
 Will McVicker <willmcvicker@...gle.com>, kernel-team@...roid.com, Tudor
 Ambarus <tudor.ambarus@...aro.org>
Subject: Re: [PATCH RFT v3 3/3] ufs: core: delegate the interrupt service
 routine to a threaded irq handler

On Mon, 2025-07-21 at 13:04 +0100, André Draszik wrote:
> Hi,
> 
> On Mon, 2025-04-07 at 12:17 +0200, Neil Armstrong wrote:
> > On systems with a large number request slots and unavailable MCQ ESI,
> > the current design of the interrupt handler can delay handling of
> > other subsystems interrupts causing display artifacts, GPU stalls
> > or system firmware requests timeouts.
> > 
> > Since the interrupt routine can take quite some time, it's
> > preferable to move it to a threaded handler and leave the
> > hard interrupt handler wake up the threaded interrupt routine,
> > the interrupt line would be masked until the processing is
> > finished in the thread thanks to the IRQS_ONESHOT flag.
> > 
> > When MCQ & ESI interrupts are enabled the I/O completions are now
> > directly handled in the "hard" interrupt routine to keep IOPs high
> > since queues handling is done in separate per-queue interrupt routines.
> 
> This patch adversely affects Pixel 6 UFS performance. It has a
> UFSHCI v3.x controller I believe (and therefore probably all
> devices with < v4) - if my limited understanding is correct,
> MCQ & ESI are a feature of v4 controllers only.
> 
> On Pixel 6, fio reports following performance on linux-next with
> this patch:
> 
> read [1] / write [2]:
>    READ: bw=17.1MiB/s (17.9MB/s), 17.1MiB/s-17.1MiB/s (17.9MB/s-17.9MB/s), io=684MiB (718MB), run=40001-40001msec
>   WRITE: bw=20.6MiB/s (21.5MB/s), 20.6MiB/s-20.6MiB/s (21.5MB/s-21.5MB/s), io=822MiB (862MB), run=40003-40003msec
> 
> With this patch reverted, performance changes back to:
> 
> read [1] / write [2]:
> 
>    READ: bw=19.9MiB/s (20.8MB/s), 19.9MiB/s-19.9MiB/s (20.8MB/s-20.8MB/s), io=795MiB (833MB), run=40001-40001msec
>   WRITE: bw=28.0MiB/s (29.4MB/s), 28.0MiB/s-28.0MiB/s (29.4MB/s-29.4MB/s), io=1122MiB (1176MB), run=40003-40003msec
> 
> all over multiple runs.
> 
> which is a ~26% reduction for write and ~14% reduction for read.
> 
> PCBenchmark even reports performance drops of ~41%.

Additional fio results (numjobs=8 instead of 1):

current linux-next:

fio --name=randread --rw=randread --ioengine=libaio --direct=1 --bs=4k --numjobs=8 --size=1g --runtime=30 --time_based --end_fsync=1 --
group_reporting --filename=/foo
   READ: bw=52.1MiB/s (54.6MB/s), 52.1MiB/s-52.1MiB/s (54.6MB/s-54.6MB/s), io=1562MiB (1638MB), run=30001-30001msec
  WRITE: bw=74.7MiB/s (78.3MB/s), 74.7MiB/s-74.7MiB/s (78.3MB/s-78.3MB/s), io=2242MiB (2351MB), run=30004-30004msec


with patch reverted:

fio --name=randread --rw=randread --ioengine=libaio --direct=1 --bs=4k --numjobs=8 --size=1g --runtime=30 --time_based --end_fsync=1 --
group_reporting --filename=/foo
   READ: bw=83.5MiB/s (87.6MB/s), 83.5MiB/s-83.5MiB/s (87.6MB/s-87.6MB/s), io=2506MiB (2628MB), run=30001-30001msec
  WRITE: bw=83.3MiB/s (87.4MB/s), 83.3MiB/s-83.3MiB/s (87.4MB/s-87.4MB/s), io=2501MiB (2622MB), run=30003-30003msec



which is an even higher 37% reduction for read.

A.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ