linux-kernel - Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <229069c1-3d61-53bb-fff7-691942c48d21@amd.com>
Date:   Tue, 12 Sep 2023 19:56:37 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     Tim Chen <tim.c.chen@...el.com>, Aaron Lu <aaron.lu@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>
Subject: Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in
 select_idle_cpu()

Hello Chenyu,

On 9/12/2023 6:02 PM, Chen Yu wrote:
> [..snip..]
>
>>> If I understand correctly, WF_SYNC is to let the wakee to woken up
>>> on the waker's CPU, rather than the wakee's previous CPU, because
>>> the waker goes to sleep after wakeup. SIS_CACHE mainly cares about
>>> wakee's previous CPU. We can only restrict that other wakee does not
>>> occupy the previous CPU, but do not enhance the possibility that
>>> wake_affine_idle() chooses the previous CPU.
>>
>> Correct me if I'm wrong here,
>>
>> Say a short sleeper, is always woken up using WF_SYNC flag. When the
>> task is dequeued, we mark the previous  CPU where it ran as "cache-hot"
>> and restrict any wakeup happening until the "cache_hot_timeout" is
>> crossed. Let us assume a perfect world where the task wakes up before
>> the "cache_hot_timeout" expires. Logically this CPU was reserved all
>> this while for the short sleeper but since the wakeup bears WF_SYNC
>> flag, the whole reservation is ignored and waker's LLC is explored.
>>
> 
> Ah, I see your point. Do you mean, because the waker has a WF_SYNC, wake_affine_idle()
> forces the short sleeping wakee to be woken up on waker's CPU rather the
> wakee's previous CPU, but wakee's previous has been marked as cache hot
> for nothing?

Precisely :)

> 
>> Should the timeout be cleared if the wakeup decides to not target the
>> previous CPU? (The default "sysctl_sched_migration_cost" is probably
>> small enough to curb any side effect that could possibly show here but
>> if a genuine use-case warrants setting "sysctl_sched_migration_cost" to
>> a larger value, the wakeup path might be affected where lot of idle
>> targets are overlooked since the CPUs are marked cache-hot forr longer
>> duration)
>>
>> Let me know what you think.
>>
> 
> This makes sense. In theory the above logic can be added in
> select_idle_sibling(), if target CPU is chosen rather than
> the previous CPU, the previous CPU's cache hot flag should be
> cleared.
> 
> But this might bring overhead. Because we need to grab the rq
> lock and write to other CPU's rq, which could be costly. It
> seems to be a trade-off of current implementation.

I agree, it will not be pretty. Maybe the other way is to have a
history of the type of wakeup the task experiences (similar to
wakee_flips but for sync and non-syn wakeups) and only reserve
the CPU if the task wakes up more via non-sync wakeups? Thinking
out loud here.

> On the other
> hand, if the user sets the sysctl_sched_migration_cost to a quite
> large value:
> 1. Without SIS_CACHE, there is no task migration.

But that is in the load balancing path. I think the wakeup path will
still migrate the task. But I believe there might be very few cases
where all CPUs are marked cache-hot and the SIS_UTIL will not bail
out straight away as a result of high utilization. Probably a rare
scenario.

> 2. With SIS_CACHE enabled, all idle CPUs are cache hot and be skipped
>    in select_idle_cpu(), the wakee will be woken up locally.
> It seems to be of the same effect, so there is no much impact
> to wakeup behavior I suppose.
> 
> [..snip..]
> 

--
Thanks and Regards,
Prateek