[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <becdb227-cd7c-7724-f79e-f5ee70692e42@efficios.com>
Date: Thu, 3 Aug 2023 16:21:37 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Aaron Lu <aaron.lu@...el.com>
Cc: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Swapnil Sapkal <Swapnil.Sapkal@....com>, x86@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/1] sched: Extend cpu idle state for 1ms
On 8/1/23 03:24, Aaron Lu wrote:
> On Wed, Jul 26, 2023 at 02:56:19PM -0400, Mathieu Desnoyers wrote:
>
> ... ...
>
>> The updated patch:
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index a68d1276bab0..1c7d5bd2968b 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -7300,6 +7300,10 @@ int idle_cpu(int cpu)
>> {
>> struct rq *rq = cpu_rq(cpu);
>> + if (READ_ONCE(rq->nr_running) <= IDLE_CPU_DELAY_MAX_RUNNING &&
>> + sched_clock_cpu(cpu_of(rq)) < READ_ONCE(rq->clock_idle) + IDLE_CPU_DELAY_NS)
>> + return 1;
>> +
>> if (rq->curr != rq->idle)
>> return 0;
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 81ac605b9cd5..57a49a5524f0 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -97,6 +97,9 @@
>> # define SCHED_WARN_ON(x) ({ (void)(x), 0; })
>> #endif
>> +#define IDLE_CPU_DELAY_NS 1000000 /* 1ms */
>> +#define IDLE_CPU_DELAY_MAX_RUNNING 4
>> +
>> struct rq;
>> struct cpuidle_state;
>>
>
> I gave this patch a run on Intel SPR(2 sockets/112cores/224cpus) and I
> also noticed huge improvement when running hackbench, especially for
> group=32/fds=20 case:
>
> when group=10/fds=20(400 tasks):
> time wakeups/migration tg->load_avg%
> base: 43s 27874246/13953871 25%
> this patch: 32s 33200766/244457 2%
> my patch: 37s 29186608/16307254 2%
>
> when group=20/fds=20(800 tasks):
> time wakeups/migrations tg->load_avg%
> base: 65s 27108751/16238701 27%
> this patch: 45s 35718552/1691220 3%
> my patch: 48s 37506974/24797284 2%
>
> when group=32/fds=20(1280 tasks):
> time wakeups/migrations tg->load_avg%
> base: 150s 36902527/16423914 36%
> this patch: 57s 30536830/6035346 6%
> my patch: 73s 45264605/21595791 3%
>
> One thing I noticed is, after this patch, the migration on wakeup path
> has dramatically reduced(see above wakeups/migrations, the number were
> captured for 5s during the run). I think this makes sense because now a
> cpu is more likely to be considered idle so a wakeup task will more
> likely stay on its prev_cpu. And when migrations is reduced, the cost of
> accessing tg->load_avg is also reduced(tg->load_avg% is the sum of
> update_cfs_group()% + update_load_avg()% as reported by perf). I think
> this is part of the reason why performance improved on this machine.
>
> Since I've been working on reducing the cost of accessing tg->load_avg[1],
> I also gave my patch a run. According to the result, even when the cost
> of accessing tg->load_avg is smaller for my patch, Mathieu's patch is
> still faster. It's not clear to me why, maybe it has something to do
> with cache reuse since my patch doesn't inhibit migration? I suppose ipc
> could reflect this?
I've also noticed a drastic reduction in the number of migrations with
my patch. I have noticed that the behavior of select_task_rq changes
drastically, but I have not figured out why yet.
I tried adding tons of schedstats counters within select_task_rq to try
to compare the decisions taken in the baseline vs modified
implementations of cpu_idle. I also tried to count how many times the
target task rq changes (which implies a migration) with a breakdown by
cause (which branch within select_task_rq cause it). I could not find a
clear culprit yet though (and I am currently on vacation, so not working
on this actively).
Thanks,
Mathieu
>
> [1]: https://lore.kernel.org/lkml/20230718134120.81199-1-aaron.lu@intel.com/
>
> Thanks,
> Aaron
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists