linux-kernel - Re: Regression in performance when using PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20260113084359.esKk1MF2@linutronix.de>
Date: Tue, 13 Jan 2026 09:43:59 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>, linux-rt-devel@...ts.linux.dev,
	liangjlee@...gle.com
Subject: Re: Regression in performance when using PREEMPT_RT

On 2026-01-12 12:33:56 [-0500], Steven Rostedt wrote:
> > CPU resources with any other task on that CPU which different than
> > softirq on !RT where it has to wait until other hardirqs complete.
> > The other thing is that if something "else" is busy, say a
> > threaded-interrupt then it will pickup this request (before ksoftirqd
> > had the chance). The result is that handler now does the I/O at its end
> > instead ksoftirqd. If the interrupt is important and has a higher
> > priority then the average MAX_RT_PRIO / 2 then this block I/O might
> > disrupt its schedule.
> 
> I'm not sure what the affect of that would be.

It could get completed by the networking interrupt. That would be okay
from the "correctness" POV but if your networking has higher priority
and disk I/O is considered low priority then it will be probably not
good.

> > I think moving the interrupt to a BIG CPU (as the block queue), if
> > possible, would be the easiest thing to do.
> 
> We have done that. It appears that the hardware simply picks the first CPU
> it can use and *always* uses that. By setting it to a big core, it did get
> some improvement but the problem is still that the softirq only runs on
> that CPU.

That is "normal". Unless you have firmware that configures each
interrupt to a CPU and linux simply uses the default value. 

> > If the hardware restricts it, I would suggest having a dedicated
> > SCHED_FIFO thread for its duty would be better than an anonymous
> > catch-all ksoftirqd.
> 
> I think the solution may be to give up on PREEMPT_RT if that's the case,
> unless there's a non PREEMPT_RT reason to make that change.
> 
> To give you an idea of what the issue is here, it is the distribution of the
> block softirq (even when the irq itself is always coming in on a single
> CPU).
> 
>  cat /proc/softirqs | grep -i block
> 
> non-rt:
> 	BLOCK:          6          0          0          0          1          7     162986     164750
> 
> RT unmodified:
> 	BLOCK:     329875          0          0          0          0          0          0          0
> 
> RT without the forced_irqthreads check:
> 	BLOCK:          0          0          0          0         11         15     164116     163619

So the last two or four CPUs are the big ones? It seems that you have at
least two queues. Not sure where they come from.
I would suggest to create threads per run-queue just to keep it within
the context. This should mimic the anonymous softirq. The difference
would be the higher priority and preference over the SCHED_OTHER tasks
which might improve the performance (which depends on the current
workload, i.e. if your system idle then there is no fight for CPU
ressources).
Or you get the one interrupt per queue if this is missing in the eMMC
driver somewhere on the software side. Usually the NVME do this.

> -- Steve

Sebastian