[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260112152702.-73BFhnF@linutronix.de>
Date: Mon, 12 Jan 2026 16:27:02 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>, linux-rt-devel@...ts.linux.dev,
liangjlee@...gle.com
Subject: Re: Regression in performance when using PREEMPT_RT
On 2025-12-26 11:02:49 [-0500], Steven Rostedt wrote:
> Hi Sebastian,
>
> We are doing some experiments in running Android Pixel with a PREEMPT_RT
> kernel, we found a few unacceptable performance regressions. One was in the
> block layer. In non-rt, the ufshcd interrupt would trigger the BLOCK
> softirq on another CPU. In RT, it always triggered the softirq on the same
> CPU as the interrupt. As the interrupt line always triggers on CPU0, it
> forces the BLOCK softirq to also always run on CPU 0, which is bad because
> CPU 0 is a little core and the main work should be running on a big core.
>
> In block/blk-mq.c:blk_mq_complete_need_ipi() there's this code:
>
> /*
> * With force threaded interrupts enabled, raising softirq from an SMP
> * function call will always result in waking the ksoftirqd thread.
> * This is probably worse than completing the request on a different
> * cache domain.
> */
> if (force_irqthreads())
> return false;
>
> When I saw "probably worse", I'm thinking this was decided by analysis and
> not by any real numbers. Was it?
>
> When we commented out the above if statement so that it did not return
> false, things sped up to almost non-rt speeds again.
>
> The fio benchmark went from 76MB to 94MB (higher is better). It's still not
> at the level of non-rt, but this was definitely one of the areas that
> caused the regression.
>
> Is that exit out of the function truly needed?
If I remember correctly, it completes in the context the
threaded-handler. It only does the remote-IPI thingy if the queue is
assigned to a different CPU than the CPU where it completes the request.
In your case it seems that either the device has multiple queues
configured and just one interrupt or the queue is configured to a
different CPU than the interrupt.
If you have multiple queues but just one interrupt then the lack
distributing the load is unfortunate. If the queue has been moved to a
BIG CPU then I suggest to move the IRQ to a BIG CPU, too.
If you ignore the statement and allow the remote-IPI to kick the softirq
then the request ends up in ksoftirqd on the remote CPU probably
accompanied by a warning. Here it runs as SCHED_OTHER and competes for
CPU resources with any other task on that CPU which different than
softirq on !RT where it has to wait until other hardirqs complete.
The other thing is that if something "else" is busy, say a
threaded-interrupt then it will pickup this request (before ksoftirqd
had the chance). The result is that handler now does the I/O at its end
instead ksoftirqd. If the interrupt is important and has a higher
priority then the average MAX_RT_PRIO / 2 then this block I/O might
disrupt its schedule.
I think moving the interrupt to a BIG CPU (as the block queue), if
possible, would be the easiest thing to do.
If the hardware restricts it, I would suggest having a dedicated
SCHED_FIFO thread for its duty would be better than an anonymous
catch-all ksoftirqd.
> Thanks,
Sebastian
Powered by blists - more mailing lists