[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90041ab2-a8a1-8572-969c-143a707faed2@oracle.com>
Date: Thu, 25 Aug 2022 02:13:17 -0700
From: Libo Chen <libo.chen@...cle.com>
To: "Gautham R. Shenoy" <gautham.shenoy@....com>
Cc: mingo@...hat.com, peterz@...radead.org, vincent.guittot@...aro.org,
juri.lelli@...hat.com, dietmar.eggemann@....com, mgorman@...e.de,
linux-kernel@...r.kernel.org,
Daniel Jordan <daniel.m.jordan@...cle.com>
Subject: Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
On 8/25/22 00:30, Gautham R. Shenoy wrote:
> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
>> There are scenarios where non-affine wakeups are incorrectly counted as
>> affine wakeups by schedstats.
>>
>> When wake_affine_idle() returns prev_cpu which doesn't equal to
>> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
>> in wake_affine() and be counted as if target == this_cpu in schedstats.
>>
>> Replace target == nr_cpumask_bits with target != this_cpu to make sure
>> affine wakeups are accurately tallied.
>>
>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
>> Suggested-by: Daniel Jordan <daniel.m.jordan@...cle.com>
>> Signed-off-by: Libo Chen <libo.chen@...cle.com>
>> ---
>> kernel/sched/fair.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index da388657d5ac..b179da4f8105 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>> target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>>
>> schedstat_inc(p->stats.nr_wakeups_affine_attempts);
>> - if (target == nr_cpumask_bits)
>> + if (target != this_cpu)
>> return prev_cpu;
>
> This seems to be the right thing to do. However,..
>
> if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu,
> technically is it still not an affine wakeup?
>
I think schedstats like ttwu_move_affine/ttwu_wake_remote is defined
within a sched domain, so if the waking CPU and the previous CPU are in
the same MC domain, then picking the previous CPU is a remote wakeup
within that MC. If the two candidate CPUs are from two different NUMA
nodes, then picking the waking CPU is an affine wakeup within that NUMA
domain. Correct me if I am wrong, this definition is consistent across
all levels of sched domains.
But I do understand that when two candidate CPUs are within an LLC,
a) all the fast-path wakeups should be affine wakeups if your
definition of an affine wakeup is a wakeup to the same LLC of the waker.
b) select_idle_sibling() may pick a CPU in that LLC other than the
two candidate CPUs which makes the affine/remote stats here useless even
if we are consistent with ttwu_move_affine/ttwu_wake_remote
definition.
I personally think it's just too much trouble to add additional code in
the kernel to, let's say, treat all wakeups within an LLC as
ttwu_move_affine. It's a lot easier to do that when you process
schedstats data,
whether you want to treat all wakeups in LLC domains as affine wakeups
or ignore ttwu_move_affine/ttwu_wake_remote stats from LLC domains.
>>
>> schedstat_inc(sd->ttwu_move_affine);
>> --
>> 2.31.1
>>
> --
> Thanks and Regards
> gautham.
Powered by blists - more mailing lists