[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f17f9d6-d529-4ddc-97f2-8f5933d49f5e@arm.com>
Date: Fri, 6 Feb 2026 13:43:38 +0000
From: Christian Loehle <christian.loehle@....com>
To: Shubhang Kaushik <shubhang@...amperecomputing.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: linux-kernel@...r.kernel.org, peterz@...radead.org, mingo@...hat.com,
juri.lelli@...hat.com, dietmar.eggemann@....com, kprateek.nayak@....com,
pierre.gondois@....com
Subject: Re: [PATCHv2] sched/fair: Skip SCHED_IDLE rq for SCHED_IDLE task
On 2/5/26 18:52, Shubhang Kaushik wrote:
> On Thu, 5 Feb 2026, Vincent Guittot wrote:
>
>> On Thu, 5 Feb 2026 at 01:00, Shubhang Kaushik
>> <shubhang@...amperecomputing.com> wrote:
>>>
>>> On Tue, 3 Feb 2026, Christian Loehle wrote:
>>>
>>>> CPUs whose rq only have SCHED_IDLE tasks running are considered to be
>>>> equivalent to truly idle CPUs during wakeup path. For fork and exec
>>>> SCHED_IDLE is even preferred.
>>>> This is based on the assumption that the SCHED_IDLE CPU is not in an
>>>> idle state and might be in a higher P-state, allowing the task/wakee
>>>> to run immediately without sharing the rq.
>>>>
>>>> However this assumption doesn't hold if the wakee has SCHED_IDLE policy
>>>> itself, as it will share the rq with existing SCHED_IDLE tasks. In this
>>>> case, we are better off continuing to look for a truly idle CPU.
>>>>
>>>> On a Intel Xeon 2-socket with 64 logical cores in total this yields
>>>> for kernel compilation using SCHED_IDLE:
>>>>
>>>> +---------+----------------------+----------------------+--------+
>>>> | workers | mainline (seconds) | patch (seconds) | delta% |
>>>> +=========+======================+======================+========+
>>>> | 1 | 4384.728 ± 21.085 | 3843.250 ± 16.235 | -12.35 |
>>>> | 2 | 2242.513 ± 2.099 | 1971.696 ± 2.842 | -12.08 |
>>>> | 4 | 1199.324 ± 1.823 | 1033.744 ± 1.803 | -13.81 |
>>>> | 8 | 649.083 ± 1.959 | 559.123 ± 4.301 | -13.86 |
>>>> | 16 | 370.425 ± 0.915 | 325.906 ± 4.623 | -12.02 |
>>>> | 32 | 234.651 ± 2.255 | 217.266 ± 0.253 | -7.41 |
>>>> | 64 | 202.286 ± 1.452 | 197.977 ± 2.275 | -2.13 |
>>>> | 128 | 217.092 ± 1.687 | 212.164 ± 1.138 | -2.27 |
>>>> +---------+----------------------+----------------------+--------+
>>>>
>>>> Signed-off-by: Christian Loehle <christian.loehle@....com>
>>>
>>> I’ve been testing this patch on an 80-core Ampere Altra (Neoverse-N1) and
>>> the results look very solid. On these high-core-count ARM systems, we
>>> definitely see the benefit of being pickier about where we place
>>> SCHED_IDLE tasks.
>>>
>>> Treating an occupied SCHED_IDLE rq as idle seems to cause
>>> unnecessary packing that shows up in the tail latency. By spreading these
>>> background tasks to truly idle cores, I'm seeing a nice boost in both
>>> background compilation and AI inference throughput.
>>>
>>> The reduction in sys time confirms that the domain balancing remains
>>> stable despite the refactor to sched_idle_rq(rq) as you and Prateek
>>> mentioned.
>>>
>>> 1. Background Kernel Compilation:
>>>
>>> I ran `time nice -n 19 make -j$nproc` to see how it handles a heavy
>>
>> nice -n 19 uses sched_other with prio 19 and not sched_idle so I'm
>> curious how you can see a difference ?
>> Or something is missing in your test description
>> Or we have a bug somewhere
>>
>
> Okay, I realized I had used nice -n 19 (SCHED_OTHER) for the initial build, which wouldn't have directly triggered the SCHED_IDLE logic. But, I did use chrt for the schbench runs, which is why those p99 wins were so consistent.
>
> I've re-run the kernel build using the correct chrt --idle 0 policy. On Ampere Altra, the throughput is along the same lines as mainline.
>
> Metric Mainline Patched Delta
> Real 9m 20.120s 9m 18.472s -1.6s
> User 382m 24.966s 380m 41.716s -1m 43s
> Sys 218m 26.192s 218m 44.908s +18.7s
>
Thanks for testing Shubhang, although I find it a bit surprising that your
kernel compilation under SCHED_IDLE doesn't improve.
Are you running with CONFIG_SCHED_CLUSTER=y? I'll try to reproduce.
Anyway at least you see a schbench improvement, I'm assuming I'll
keep you Tested-by?
Powered by blists - more mailing lists