lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCEM4A+-6FWEb1ncMSObgqBbcs7EWc5Xwk+KpHJBRRw9A@mail.gmail.com>
Date: Tue, 24 Sep 2024 15:00:52 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Pierre Gondois <pierre.gondois@....com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com, 
	rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org, qyousef@...alina.io, 
	hongyan.xia2@....com
Subject: Re: [RFC PATCH 5/5] sched/fair: Add push task callback for EAS

On Fri, 13 Sept 2024 at 18:08, Pierre Gondois <pierre.gondois@....com> wrote:
>
> Hello Vincent,
>
> On 8/30/24 15:03, Vincent Guittot wrote:
> > EAS is based on wakeup events to efficiently place tasks on the system, but
> > there are cases where a task will not have wakeup events anymore or at a
> > far too low pace. For such situation, we can take advantage of the task
> > being put back in the enqueued list to check if it should be migrated on
> > another CPU. When the task is the only one running on the CPU, the tick
> > will check it the task is stuck on this CPU and should migrate on another
> > one.
> >
> > Wake up events remain the main way to migrate tasks but we now detect
> > situation where a task is stuck on a CPU by checking that its utilization
> > is larger than the max available compute capacity (max cpu capacity or
> > uclamp max setting)
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > ---
> >   kernel/sched/fair.c  | 211 +++++++++++++++++++++++++++++++++++++++++++
> >   kernel/sched/sched.h |   2 +
> >   2 files changed, 213 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e46af2416159..41fb18ac118b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
>
> [...]
>
> > +
> > +static inline void check_misfit_cpu(struct task_struct *p, struct rq *rq)
> > +{
> > +     int new_cpu, cpu = cpu_of(rq);
> > +
> > +     if (!sched_energy_enabled())
> > +             return;
> > +
> > +     if (WARN_ON(!p))
> > +             return;
> > +
> > +     if (WARN_ON(p != rq->curr))
> > +             return;
> > +
> > +     if (is_migration_disabled(p))
> > +             return;
> > +
> > +     if ((rq->nr_running > 1) || (p->nr_cpus_allowed == 1))
>
> If the goal is to detect tasks that should be migrated to bigger CPUs,

We don't want the tick to try to do active migration for the running
task if the task can't run on another CPU than this one. This is not
related to the migration to bigger CPUs that is done by
update_misfit_status().

> couldn't the check be changed from:
> -  (p->nr_cpus_allowed == 1)
> to
> - (p->max_allowed_capacity == arch_scale_cpu_capacity(cpu))
> to avoid the case where a task is bound to the little cluster for instance ?

I was about to say yes, but the condition
(arch_scale_cpu_capacity(cpu) == p->max_allowed_capacity) is too large
and prevents migrating to a smaller cpu which is one case that we want
to handle here.  That being said I have an internal patch that
includes the check done by update_misfit_status() for push callback
mechanism but I didn't add it in this version to not add to much
change in the same serie.

>
> Similar question for update_misfit_status(), doesn't:
> - (arch_scale_cpu_capacity(cpu) == p->max_allowed_capacity)
> include this case:
> - (p->nr_cpus_allowed == 1)

For update_misfit_status, you are right

>
>
> > +             return;
> > +
> > +     if (!task_misfit_cpu(p, cpu))
> > +             return;
>
> task_misfit_cpu() intends to check whether the task will have an opportunity
> to run feec() though wakeups/push-pull.
>
> Shouldn't we check whether the task fits the CPU with the 20% margin
> with task_fits_cpu() aswell ? This would allow to migrate the task
> faster than the load_balancer.

As mentioned above I have a patch that I didn't send that adds the
update_misfit_status() condition in the push call. I agree that should
speedup some migrations of misfit task to bigger cpu without waking up
an idle CPU to do the load balance and pull the task on a potentially
3rd CPU

>
>
> > +
> > +     new_cpu = find_energy_efficient_cpu(p, cpu);
> > +
> > +     if (new_cpu == cpu)
> > +             return;
> > +
> > +     /*
> > +      * ->active_balance synchronizes accesses to
> > +      * ->active_balance_work.  Once set, it's cleared
> > +      * only after active load balance is finished.
> > +      */
> > +     if (!rq->active_balance) {
> > +             rq->active_balance = 1;
> > +             rq->push_cpu = new_cpu;
> > +     } else
> > +             return;
> > +
> > +     raw_spin_rq_unlock(rq);
> > +     stop_one_cpu_nowait(cpu,
> > +             active_load_balance_cpu_stop, rq,
> > +             &rq->active_balance_work);
> > +     raw_spin_rq_lock(rq);
> > +}
> > +
>
> Regards,
> Pierre

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ