linux-kernel - Re: [RFC PATCH 5/5] sched/fair: Add push task callback for EAS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtD5+Nf+xkzJswNBgdpF4MQndBp3LfVC0M0BLc7LZXB6QQ@mail.gmail.com>
Date: Tue, 24 Sep 2024 14:37:17 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Pierre Gondois <pierre.gondois@....com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com, 
	rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org, qyousef@...alina.io, 
	hongyan.xia2@....com
Subject: Re: [RFC PATCH 5/5] sched/fair: Add push task callback for EAS

On Fri, 13 Sept 2024 at 11:10, Pierre Gondois <pierre.gondois@....com> wrote:
>
> Hello Vincent,
>
> On 9/12/24 14:30, Vincent Guittot wrote:
> > Hello Pierre,
> >
> > On Wed, 11 Sept 2024 at 16:03, Pierre Gondois <pierre.gondois@....com> wrote:
> >>
> >> Hello Vincent,
> >>
> >> On 8/30/24 15:03, Vincent Guittot wrote:
> >>> EAS is based on wakeup events to efficiently place tasks on the system, but
> >>> there are cases where a task will not have wakeup events anymore or at a
> >>> far too low pace. For such situation, we can take advantage of the task
> >>> being put back in the enqueued list to check if it should be migrated on
> >>> another CPU. When the task is the only one running on the CPU, the tick
> >>> will check it the task is stuck on this CPU and should migrate on another
> >>> one.
> >>>
> >>> Wake up events remain the main way to migrate tasks but we now detect
> >>> situation where a task is stuck on a CPU by checking that its utilization
> >>> is larger than the max available compute capacity (max cpu capacity or
> >>> uclamp max setting)
> >>>
> >>> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> >>> ---
> >>>    kernel/sched/fair.c  | 211 +++++++++++++++++++++++++++++++++++++++++++
> >>>    kernel/sched/sched.h |   2 +
> >>>    2 files changed, 213 insertions(+)
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.cHel
> >>> index e46af2416159..41fb18ac118b 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>
> >> [snip]
> >>
> >>> +
> >>> +static inline void check_misfit_cpu(struct task_struct *p, struct rq *rq)
> >>> +{
> >>> +     int new_cpu, cpu = cpu_of(rq);
> >>> +
> >>> +     if (!sched_energy_enabled())
> >>> +             return;
> >>> +
> >>> +     if (WARN_ON(!p))
> >>> +             return;
> >>> +
> >>> +     if (WARN_ON(p != rq->curr))
> >>> +             return;
> >>> +
> >>> +     if (is_migration_disabled(p))
> >>> +             return;
> >>> +
> >>> +     if ((rq->nr_running > 1) || (p->nr_cpus_allowed == 1))
> >>> +             return;
> >>
> >> I tried the code on a Pixel6 with the following setup:
> >> - without the above (rq->nr_running > 1) condition
> >> - without the push task mechanism
> >> i.e. tasks without regular wakeups only have the opportunity to
> >> run feec() via the sched_tick. It seemed sufficient to avoid
> >> the problematic you mentioned:
> >> - having unbalanced UCLAMP_MAX tasks in a pd, e.g. 1 UCLAMP_MAX task
> >>     per little CPU, except one little CPU with N UCLAMP_MAX tasks
> >> - downgrading UCLAMP_MAX tasks that could run on smaller CPUs
> >>     but have no wakeups and thus don't run feec()
> >
> > The main problem with your test is that you always call feec() for the
> > running task so you always have to wake up the migration thread to
> > migrate the current running thread which is quite inefficient. The
> > push mechanism only takes a task which is not the current running one
> > and we don't need to wake up migration thread which is simpler and
> > more efficient. We check only one task at a time and will not loop on
> > an unbounded number of tasks after a task switch or a tick
> >
> >>
> >> Thus I was wondering it it would not be better to integrate the
> >> EAS to the load balancer instead (not my idea, but don't remember
> >> who suggested that).
> >
> > My 1st thought was also to use load balance to pull tasks which were
> > stuck on the wrong CPU (as mentioned in [1]) but this solution is not
> > scalable as we don't want to test all runnable task on a cpu and it's
> > not really easy to know which cpu and which tasks should be checked
> >
> > [1] https://youtu.be/PHEBAyxeM_M?si=ZApIOw3BS4SOLPwp
> >
> >> Or otherwise if just running feec() through the sched_tick path
> >> would not be sufficient (i.e. this patch minus the push mechanism).
> >
> > As mentioned above, the push mechanism is more efficient than active migration.
> >
> >
> >>
> >>> +
> >>> +     if (!task_misfit_cpu(p, cpu))
> >>> +             return;
> >>> +
> >>> +     new_cpu = find_energy_efficient_cpu(p, cpu);
> >>> +
> >>> +     if (new_cpu == cpu)
> >>> +             return;
> >>> +
> >>> +     /*
> >>> +      * ->active_balance synchronizes accesses to
> >>> +      * ->active_balance_work.  Once set, it's cleared
> >>> +      * only after active load balance is finished.
> >>> +      */
> >>> +     if (!rq->active_balance) {
> >>> +             rq->active_balance = 1;
> >>> +             rq->push_cpu = new_cpu;
> >>> +     } else
> >>> +             return;
> >>> +
> >>> +     raw_spin_rq_unlock(rq);
> >>> +     stop_one_cpu_nowait(cpu,
> >>> +             active_load_balance_cpu_stop, rq,
> >>> +             &rq->active_balance_work);
> >>> +     raw_spin_rq_lock(rq);
> >>
> >> I didn't hit any error, but isn't it eligible to the following ?
> >>     commit f0498d2a54e7 ("sched: Fix stop_one_cpu_nowait() vs hotplug")
> >>
> >
> > I will recheck but being called from the tick, for the local cpu and
> > with a running thread no being cpu_stopper_thread, should protect us
> > from the case describe in this commit
> >
> >>
> >>> +}
> >>> +
> >>> +static inline int has_pushable_tasks(struct rq *rq)
> >>> +{
> >>> +     return !plist_head_empty(&rq->cfs.pushable_tasks);
> >>> +}
> >>> +
> >>> +static struct task_struct *pick_next_pushable_fair_task(struct rq *rq)
> >>> +{
> >>> +     struct task_struct *p;
> >>> +
> >>> +     if (!has_pushable_tasks(rq))
> >>> +             return NULL;
> >>> +
> >>> +     p = plist_first_entry(&rq->cfs.pushable_tasks,
> >>> +                           struct task_struct, pushable_tasks);
> >>> +
> >>> +     WARN_ON_ONCE(rq->cpu != task_cpu(p));
> >>> +     WARN_ON_ONCE(task_current(rq, p));
> >>> +     WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
> >>> +
> >>> +     WARN_ON_ONCE(!task_on_rq_queued(p));
> >>> +
> >>> +     /*
> >>> +      * Remove task from the pushable list as we try only once after
> >>> +      * task has been put back in enqueued list.
> >>> +      */
> >>> +     plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks);
> >>> +
> >>> +     return p;
> >>> +}
> >>> +
> >>> +/*
> >>> + * See if the non running fair tasks on this rq
> >>> + * can be sent to some other CPU that fits better with
> >>> + * their profile.
> >>> + */
> >>> +static int push_fair_task(struct rq *rq)
> >>> +{
> >>> +     struct task_struct *next_task;
> >>> +     struct rq *new_rq;
> >>> +     int prev_cpu, new_cpu;
> >>> +     int ret = 0;
> >>> +
> >>> +     next_task = pick_next_pushable_fair_task(rq);
> >>> +     if (!next_task)
> >>> +             return 0;
> >>> +
> >>> +     if (is_migration_disabled(next_task))
> >>> +             return 0;
> >>> +
> >>> +     if (WARN_ON(next_task == rq->curr))
> >>> +             return 0;
> >>> +
> >>> +     /* We might release rq lock */
> >>> +     get_task_struct(next_task);
> >>> +
> >>> +     prev_cpu = rq->cpu;
> >>> +
> >>> +     new_cpu = find_energy_efficient_cpu(next_task, prev_cpu);
> >>> +
> >>> +     if (new_cpu == prev_cpu)
> >>> +             goto out;
> >>> +
> >>> +     new_rq = cpu_rq(new_cpu);
> >>> +
> >>> +     if (double_lock_balance(rq, new_rq)) {
> >>
> >>
> >> I think it might be necessary to check the following:
> >>     if (task_cpu(next_task) != rq->cpu) {
> >>       double_unlock_balance(rq, new_rq);
> >
> > Yes good point
> >
> >>       goto out;
> >>     }
> >>
> >> Indeed I've been hitting the following warnings:
> >> - uclamp_rq_dec_id():SCHED_WARN_ON(!bucket->tasks)
> >> - set_task_cpu()::WARN_ON_ONCE(state == TASK_RUNNING &&
> >>                       p->sched_class == &fair_sched_class &&
> >>                       (p->on_rq && !task_on_rq_migrating(p)))
> >> - update_entity_lag()::SCHED_WARN_ON(!se->on_rq)
> >>
> >> and it seemed to be caused by the task not being on the initial rq anymore.
> >
> > Do you have a particular use case to trigger this ? I haven't faced
> > this in the various stress tests that  I did
>
> It was triggered by this workload, but it was on a Pixel6 device,
> so there might be more background activity:
> - 8 tasks with: [UCLAMP_MIN:0, UCLAMP_MAX:1, duty_cycle:100%, period:16ms]
> - 4 tasks each bound to a little CPU with: [UCLAMP_MIN:0, UCLAMP_MAX:1024, duty_cycle:4%, period:4ms]

Thanks I'm going to have a closer look

>
> Regards,
> Pierre