linux-kernel - Re: [PATCH] sched/fair: fix case with reduced capacity CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <063c4695-a9d4-f77a-55a7-7c554b765c7e@arm.com>
Date:   Fri, 8 Jul 2022 12:10:40 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, david.chen@...anix.com,
        zhangqiao22@...wei.com
Subject: Re: [PATCH] sched/fair: fix case with reduced capacity CPU

On 08/07/2022 09:17, Vincent Guittot wrote:
> On Thu, 7 Jul 2022 at 18:43, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>
>> On 02/07/2022 06:52, Vincent Guittot wrote:

[...]

>>> The rework of the load balance has filterd the case when the CPU is

s/filterd/filtered

>>> classified to be fully busy but its capacity is reduced.
>>>
>>> Check if CPU's capacity is reduced while gathering load balance statistics
>>> and classify it group_misfit_task instead of group_fully_busy so we can

enum group_type {

   ...
   /*
    * SD_ASYM_CPUCAPACITY only: One task doesn't fit with CPU's capacity
    * and must be migrated to a more powerful CPU.
    */
   group_misfit_task
   ...

This `SD_ASYM_CPUCAPACITY only:` should be removed now.

[...]

>>> @@ -8798,6 +8798,19 @@ sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs
>>>       return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
>>>  }
>>>
>>> +static inline bool
>>> +sched_reduced_capacity(struct rq *rq, struct sched_domain *sd)

minor: Why not `static inline int check_reduced_capacity()` ? All
similar functions like check_cpu_capacity(), check_cpu_capacity() follow
this approach.

[...]

>>> @@ -8851,11 +8865,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>>>               if (local_group)
>>>                       continue;
>>>
>>> -             /* Check for a misfit task on the cpu */
>>> -             if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
>>> -                 sgs->group_misfit_task_load < rq->misfit_task_load) {
>>> -                     sgs->group_misfit_task_load = rq->misfit_task_load;
>>> -                     *sg_status |= SG_OVERLOAD;
>>> +             if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
>>> +                     /* Check for a misfit task on the cpu */
>>> +                     if (sgs->group_misfit_task_load < rq->misfit_task_load) {
>>> +                             sgs->group_misfit_task_load = rq->misfit_task_load;
>>> +                             *sg_status |= SG_OVERLOAD;
>>> +                     }
>>> +             } else if ((env->idle != CPU_NOT_IDLE) &&
>>> +                        sched_reduced_capacity(rq, env->sd) &&
>>> +                        (sgs->group_misfit_task_load < load)) {
>>> +                     /* Check for a task running on a CPU with reduced capacity */
>>> +                     sgs->group_misfit_task_load = load;
>>>               }

Minor:

This now has if(A)
              if(B)
	     else if(C && B')

little bit harder to read.

[...]

>> I'm wondering why you've chosen that hybrid approach `group_misfit_task
>> -> migrate_load` and not `group_misfit_task -> migrate_misfit`.
> 
> because, it means enabling the tracking of misfit task on rq at each
> task enqueue/dequeue/tick ...  Then mistfit for heterogeneous platform
> checks max_cpu_capacity what we don't care and will trigger unwanted
> misfit migration for smp

Agreed, rq->misfit_task_load can't be used here.

>> It looks like this `rq->cfs.h_nr_running = 1` case almost (since we
>> check `busiest->nr_running > 1`) always ends up in the load_balance()
>> `if (!ld_moved)` condition and need_active_balance() can return 1 in
>> case `if ((env->idle != CPU_NOT_IDLE) && ...` condition. This leads to
>> active load_balance and this
>>
>> IMHO, the same you can achieve when you would stay with
>> `group_misfit_task -> migrate_misfit`.
>>
>> I think cpu_load(rq) can be used instead of `rq->misfit_task_load` in
>> the migrate_misfit case of find_busiest_queue() too.
> 
> I don't think because you can have a higher cpu_load() but not being misfit

You're right, I forgot about this. Essentially we would need extra state
(e.g. in lb_env) to save which CPU in the busiest group has the misfit.