linux-kernel - Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <90041ab2-a8a1-8572-969c-143a707faed2@oracle.com>
Date:   Thu, 25 Aug 2022 02:13:17 -0700
From:   Libo Chen <libo.chen@...cle.com>
To:     "Gautham R. Shenoy" <gautham.shenoy@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, vincent.guittot@...aro.org,
        juri.lelli@...hat.com, dietmar.eggemann@....com, mgorman@...e.de,
        linux-kernel@...r.kernel.org,
        Daniel Jordan <daniel.m.jordan@...cle.com>
Subject: Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine



On 8/25/22 00:30, Gautham R. Shenoy wrote:
> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
>> There are scenarios where non-affine wakeups are incorrectly counted as
>> affine wakeups by schedstats.
>>
>> When wake_affine_idle() returns prev_cpu which doesn't equal to
>> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
>> in wake_affine() and be counted as if target == this_cpu in schedstats.
>>
>> Replace target == nr_cpumask_bits with target != this_cpu to make sure
>> affine wakeups are accurately tallied.
>>
>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
>> Suggested-by: Daniel Jordan <daniel.m.jordan@...cle.com>
>> Signed-off-by: Libo Chen <libo.chen@...cle.com>
>> ---
>>   kernel/sched/fair.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index da388657d5ac..b179da4f8105 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>>   		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>>   
>>   	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
>> -	if (target == nr_cpumask_bits)
>> +	if (target != this_cpu)
>>   		return prev_cpu;
>
> This seems to be the right thing to do. However,..
>
> if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu,
> technically is it still not an affine wakeup?
>
I think schedstats like ttwu_move_affine/ttwu_wake_remote is defined 
within a sched domain, so if the waking CPU and the previous CPU are in 
the same MC domain, then picking the previous CPU is a remote wakeup
within that MC. If the two candidate CPUs are from two different NUMA 
nodes, then picking the waking CPU is an affine wakeup within that NUMA 
domain. Correct me if I am wrong, this definition is consistent across
all levels of sched domains.

But I do understand that when two candidate CPUs are within an LLC,
     a) all the fast-path wakeups should be affine wakeups if your 
definition of an affine wakeup is a wakeup to the same LLC of the waker.
     b) select_idle_sibling() may pick a CPU in that LLC other than the 
two candidate CPUs which makes the affine/remote stats here useless even 
if we are consistent with ttwu_move_affine/ttwu_wake_remote
        definition.

I personally think it's just too much trouble to add additional code in 
the kernel to, let's say, treat all wakeups within an LLC as 
ttwu_move_affine. It's a lot easier to do that when you process 
schedstats data,
whether you want to treat all wakeups in LLC domains as affine wakeups 
or ignore ttwu_move_affine/ttwu_wake_remote stats from LLC domains.



>>   
>>   	schedstat_inc(sd->ttwu_move_affine);
>> -- 
>> 2.31.1
>>
> --
> Thanks and Regards
> gautham.