linux-kernel - Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <08334d9d-8a0a-468f-a1db-ce6c19e491f7@arm.com>
Date: Thu, 25 Jan 2024 10:35:25 +0000
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Qais Yousef <qyousef@...alina.io>
Cc: Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>, linux-kernel@...r.kernel.org,
 Pierre Gondois <Pierre.Gondois@....com>
Subject: Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when
 updating misfit

On 24/01/2024 22:43, Qais Yousef wrote:
> On 01/23/24 18:07, Dietmar Eggemann wrote:
>> On 22/01/2024 19:02, Qais Yousef wrote:
>>> On 01/22/24 09:59, Dietmar Eggemann wrote:
>>>> On 05/01/2024 23:20, Qais Yousef wrote:
>>>>> From: Qais Yousef <qais.yousef@....com>

[...]

>>>> What happen when we hotplug out all CPUs of one CPU capacity value?
>>>> IMHO, we don't call asym_cpu_capacity_scan() with !new_topology
>>>> (partition_sched_domains_locked()).
>>>
>>> Right. I missed that. We can add another intersection check against
>>> cpu_active_mask.
>>>
>>> I am assuming the skipping was done by design, not a bug that needs fixing?
>>> I see for suspend (cpuhp_tasks_frozen) the domains are rebuilt, but not for
>>> hotplug.
>>
>> IMHO, it's by design. We setup asym_cap_list only when new_topology is
>> set (update_topology_flags_workfn() from init_cpu_capacity_callback() or
>> topology_init_cpu_capacity_cppc()). I.e. when the (max) CPU capacity can
>> change.
>> In all the other !new_topology cases we check `has_asym |= sd->flags &
>> SD_ASYM_CPUCAPACITY` and set sched_asym_cpucapacity accordingly in
>> build_sched_domains(). Before we always reset sched_asym_cpucapacity in
>> detach_destroy_domains().
>> But now we would have to keep asym_cap_list in sync with the active CPUs
>> I guess.
> 
> Okay, so you suggest we need to update the code to keep it in sync. Let's see
> first if Vincent is satisfied with this list traversal or we need to go another
> way :-)

Yes, if preventing the 'increase of balance_interval' will cure this
issue as well, then this will be definitely the less invasive fix.

Can you not easily do a 'perf bench sched messaging -g X -l Y' test on
you M1 to get some numbers behind this additional list traversal in
pick_next_task_fair()?

> I think it is worth having this asym_capacity list available. It seemed several
> times we needed it and just a little work is required to make it available for
> potential future users. Even if we don't merge immediately.

I agree. It would give us this ordered (by max CPU capacity) list of
CPUs to iterate over.

[...]