linux-kernel - Re: [PATCH] sched/fair: Remove the duplicate check from group_has

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jhjv9hph3h7.mognet@arm.com>
Date:   Tue, 11 Aug 2020 11:38:28 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     Qi Zheng <arch0.zheng@...il.com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: Remove the duplicate check from group_has_capacity()


On 11/08/20 04:39, Qi Zheng wrote:
> On 2020/8/11 上午2:33, Valentin Schneider wrote:
>>
>> On 10/08/20 02:00, Qi Zheng wrote:
>>> 1. The group_has_capacity() function is only called in
>>>     group_classify().
>>> 2. The following inequality has already been checked in
>>>     group_is_overloaded() which was also called in
>>>     group_classify().
>>>
>>>        (sgs->group_capacity * imbalance_pct) <
>>>                          (sgs->group_runnable * 100)
>>>
>>
>> Consider group_is_overloaded() returns false because of the first
>> condition:
>>
>>          if (sgs->sum_nr_running <= sgs->group_weight)
>>                  return false;
>>
>> then group_has_capacity() would be the first place where the group_runnable
>> vs group_capacity comparison would be done.
>>
>> Now in that specific case we'll actually only check it if
>>
>>    sgs->sum_nr_running == sgs->group_weight
>>
>> and the only case where the runnable vs capacity check can fail here is if
>> there's significant capacity pressure going on. TBH this capacity pressure
>> could be happening even when there are fewer tasks than CPUs, so I'm not
>> sure how intentional that corner case is.
>
> Maybe some cpus in sg->cpumask are no longer active at the == case,
> which causes the significant capacity pressure?
>

That can only happen in that short window between deactivating a CPU and
not having rebuilt the sched_domains yet, which sounds quite elusive.