[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZMH6zVe4ezzyoNxr@chenyu5-mobl2>
Date: Thu, 27 Jul 2023 13:04:13 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>,
<linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
"Valentin Schneider" <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
"Ben Segall" <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
"Daniel Bristot de Oliveira" <bristot@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
"Swapnil Sapkal" <Swapnil.Sapkal@....com>,
Aaron Lu <aaron.lu@...el.com>, <x86@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/1] sched: Extend cpu idle state for 1ms
On 2023-07-26 at 10:07:30 -0400, Mathieu Desnoyers wrote:
> On 7/26/23 04:04, Shrikanth Hegde wrote:
> >
> >
> > On 7/26/23 1:00 AM, Mathieu Desnoyers wrote:
> > > Allow select_task_rq to consider a cpu as idle for 1ms after that cpu
> > > has exited the idle loop.
> > >
> > > This speeds up the following hackbench workload on a 192 cores AMD EPYC
> > > 9654 96-Core Processor (over 2 sockets):
> > >
> > > hackbench -g 32 -f 20 --threads --pipe -l 480000 -s 100
> > >
> > > from 49s to 34s. (30% speedup)
> > >
> > > My working hypothesis for why this helps is: queuing more than a single
> > > task on the runqueue of a cpu which just exited idle rather than
> > > spreading work over other idle cpus helps power efficiency on systems
> > > with large number of cores.
> > >
This looks interesting. And it does help power efficiency but how it could
improve throughput? Is it because of hot cache locality waking up task on
it previous running CPU(because it will be easier to be treated as idle),
or just reducing the time in select_idle_sibling()?
> Good point !
>
> Can you try your benchmark replacing the if () statement above by:
>
> + if (sched_clock() < READ_ONCE(rq->idle_end_time) + IDLE_CPU_DELAY_NS &&
> + READ_ONCE(rq->nr_running) <= 4)
If I understand correctly, this nr_running is to filter the case that the system
is saturated? If that is the case, maybe
rq->avg_idle >= sysctl_sched_migration_cost
could be checked in case there is 1 long running task and we don't want to treat this
cpu as 'idle'?
thanks,
Chenyu
Powered by blists - more mailing lists