linux-kernel - Re: [RFC PATCH] sched: fix the nonsense shares when load of cfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm267dzx47k9.fsf@bsegall-linux.svl.corp.google.com>
Date:   Fri, 06 Mar 2020 11:17:10 -0800
From:   bsegall@...gle.com
To:     王贇 <yun.wang@...ux.alibaba.com>
Cc:     bsegall@...gle.com, Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        "open list\:SCHEDULER" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] sched: fix the nonsense shares when load of cfs_rq is too, small

王贇 <yun.wang@...ux.alibaba.com> writes:

> On 2020/3/5 上午2:47, bsegall@...gle.com wrote:
> [snip]
>>> Argh, because A->cfs_rq.load.weight is B->se.load.weight which is
>>> B->shares/nr_cpus.
>>>
>>>> While the se of D on root cfs_rq is far more bigger than 2, so it
>>>> wins the battle.
>>>>
>>>> This patch add a check on the zero load and make it as MIN_SHARES
>>>> to fix the nonsense shares, after applied the group C wins as
>>>> expected.
>>>>
>>>> Signed-off-by: Michael Wang <yun.wang@...ux.alibaba.com>
>>>> ---
>>>>  kernel/sched/fair.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 84594f8aeaf8..53d705f75fa4 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -3182,6 +3182,8 @@ static long calc_group_shares(struct cfs_rq *cfs_rq)
>>>>  	tg_shares = READ_ONCE(tg->shares);
>>>>
>>>>  	load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
>>>> +	if (!load && cfs_rq->load.weight)
>>>> +		load = MIN_SHARES;
>>>>
>>>>  	tg_weight = atomic_long_read(&tg->load_avg);
>>>
>>> Yeah, I suppose that'll do. Hurmph, wants a comment though.
>>>
>>> But that has me looking at other users of scale_load_down(), and doesn't
>>> at least update_tg_cfs_load() suffer the same problem?
>> 
>> I think instead we should probably scale_load_down(tg_shares) and
>> scale_load(load_avg). tg_shares is always a scaled integer, so just
>> moving the source of the scaling in the multiply should do the job.
>> 
>> ie
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index fcc968669aea..6d7a9d72d742 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3179,9 +3179,9 @@ static long calc_group_shares(struct cfs_rq *cfs_rq)
>>         long tg_weight, tg_shares, load, shares;
>>         struct task_group *tg = cfs_rq->tg;
>>  
>> -       tg_shares = READ_ONCE(tg->shares);
>> +       tg_shares = scale_load_down(READ_ONCE(tg->shares));
>>  
>> -       load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
>> +       load = max(cfs_rq->load.weight, scale_load(cfs_rq->avg.load_avg));
>>  
>>         tg_weight = atomic_long_read(&tg->load_avg);
>
> Get the point, but IMHO fix scale_load_down() sounds better, to
> cover all the similar cases, let's first try that way see if it's
> working :-)

Yeah, that might not be a bad idea as well; it's just that doing this
fix would keep you from losing all your precision (and I'd have to think
if that would result in fairness issues like having all the group ses
having the full tg shares, or something like that).