[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jhj7dsu54qg.mognet@arm.com>
Date: Wed, 16 Sep 2020 12:40:55 +0100
From: Valentin Schneider <valentin.schneider@....com>
To: Mel Gorman <mgorman@...e.de>
Cc: Aubrey Li <aubrey.li@...ux.intel.com>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com,
tim.c.chen@...ux.intel.com, linux-kernel@...r.kernel.org,
Qais Yousef <qais.yousef@....com>,
Jiang Biao <benbjiang@...il.com>
Subject: Re: [RFC PATCH v2] sched/fair: select idle cpu from idle cpumask in sched domain
On 16/09/20 12:00, Mel Gorman wrote:
> On Wed, Sep 16, 2020 at 12:31:03PM +0800, Aubrey Li wrote:
>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>> enters idle, its corresponding bit in the idle cpumask will be set,
>> and when the CPU exits idle, its bit will be cleared.
>>
>> When a task wakes up to select an idle cpu, scanning idle cpumask
>> has low cost than scanning all the cpus in last level cache domain,
>> especially when the system is heavily loaded.
>>
>> The following benchmarks were tested on a x86 4 socket system with
>> 24 cores per socket and 2 hyperthreads per core, total 192 CPUs:
>>
>
> This still appears to be tied to turning the tick off. An idle CPU
> available for computation does not necessarily have the tick turned off
> if it's for short periods of time. When nohz is disabled or a machine is
> active enough that CPUs are not disabling the tick, select_idle_cpu may
> fail to select an idle CPU and instead stack tasks on the old CPU.
>
Vincent was pointing out in v1 that we ratelimit nohz_balance_exit_idle()
by having it happen on a tick to prevent being hammered by a flurry of
idle enter / exit sub tick granularity. I'm afraid flipping bits of this
cpumask on idle enter / exit might be too brutal.
> The other subtlety is that select_idle_sibling() currently allows a
> SCHED_IDLE cpu to be used as a wakeup target. The CPU is not really
> idle as such, it's simply running a low priority task that is suitable
> for preemption. I suspect this patch breaks that.
I think you're spot on.
An alternative I see here would be to move this into its own
select_idle_foo() function. If that mask is empty or none of the tagged
CPUs actually pass available_idle_cpu(), we fall-through to the usual idle
searches.
That's far from perfect; you could wake a truly idle CPU instead of
preempting a SCHED_IDLE task on a warm and busy CPU. I'm not sure if a
proliferation of cpumask really is the answer to that...
Powered by blists - more mailing lists