[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ccc3a77-64de-48fd-b118-f08aef9e1d2c@arm.com>
Date: Fri, 13 Sep 2024 11:09:52 +0200
From: Pierre Gondois <pierre.gondois@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com,
rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org,
qyousef@...alina.io, hongyan.xia2@....com
Subject: Re: [RFC PATCH 5/5] sched/fair: Add push task callback for EAS
Hello Vincent,
On 9/12/24 14:30, Vincent Guittot wrote:
> Hello Pierre,
>
> On Wed, 11 Sept 2024 at 16:03, Pierre Gondois <pierre.gondois@....com> wrote:
>>
>> Hello Vincent,
>>
>> On 8/30/24 15:03, Vincent Guittot wrote:
>>> EAS is based on wakeup events to efficiently place tasks on the system, but
>>> there are cases where a task will not have wakeup events anymore or at a
>>> far too low pace. For such situation, we can take advantage of the task
>>> being put back in the enqueued list to check if it should be migrated on
>>> another CPU. When the task is the only one running on the CPU, the tick
>>> will check it the task is stuck on this CPU and should migrate on another
>>> one.
>>>
>>> Wake up events remain the main way to migrate tasks but we now detect
>>> situation where a task is stuck on a CPU by checking that its utilization
>>> is larger than the max available compute capacity (max cpu capacity or
>>> uclamp max setting)
>>>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
>>> ---
>>> kernel/sched/fair.c | 211 +++++++++++++++++++++++++++++++++++++++++++
>>> kernel/sched/sched.h | 2 +
>>> 2 files changed, 213 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.cHel
>>> index e46af2416159..41fb18ac118b 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>
>> [snip]
>>
>>> +
>>> +static inline void check_misfit_cpu(struct task_struct *p, struct rq *rq)
>>> +{
>>> + int new_cpu, cpu = cpu_of(rq);
>>> +
>>> + if (!sched_energy_enabled())
>>> + return;
>>> +
>>> + if (WARN_ON(!p))
>>> + return;
>>> +
>>> + if (WARN_ON(p != rq->curr))
>>> + return;
>>> +
>>> + if (is_migration_disabled(p))
>>> + return;
>>> +
>>> + if ((rq->nr_running > 1) || (p->nr_cpus_allowed == 1))
>>> + return;
>>
>> I tried the code on a Pixel6 with the following setup:
>> - without the above (rq->nr_running > 1) condition
>> - without the push task mechanism
>> i.e. tasks without regular wakeups only have the opportunity to
>> run feec() via the sched_tick. It seemed sufficient to avoid
>> the problematic you mentioned:
>> - having unbalanced UCLAMP_MAX tasks in a pd, e.g. 1 UCLAMP_MAX task
>> per little CPU, except one little CPU with N UCLAMP_MAX tasks
>> - downgrading UCLAMP_MAX tasks that could run on smaller CPUs
>> but have no wakeups and thus don't run feec()
>
> The main problem with your test is that you always call feec() for the
> running task so you always have to wake up the migration thread to
> migrate the current running thread which is quite inefficient. The
> push mechanism only takes a task which is not the current running one
> and we don't need to wake up migration thread which is simpler and
> more efficient. We check only one task at a time and will not loop on
> an unbounded number of tasks after a task switch or a tick
>
>>
>> Thus I was wondering it it would not be better to integrate the
>> EAS to the load balancer instead (not my idea, but don't remember
>> who suggested that).
>
> My 1st thought was also to use load balance to pull tasks which were
> stuck on the wrong CPU (as mentioned in [1]) but this solution is not
> scalable as we don't want to test all runnable task on a cpu and it's
> not really easy to know which cpu and which tasks should be checked
>
> [1] https://youtu.be/PHEBAyxeM_M?si=ZApIOw3BS4SOLPwp
>
>> Or otherwise if just running feec() through the sched_tick path
>> would not be sufficient (i.e. this patch minus the push mechanism).
>
> As mentioned above, the push mechanism is more efficient than active migration.
>
>
>>
>>> +
>>> + if (!task_misfit_cpu(p, cpu))
>>> + return;
>>> +
>>> + new_cpu = find_energy_efficient_cpu(p, cpu);
>>> +
>>> + if (new_cpu == cpu)
>>> + return;
>>> +
>>> + /*
>>> + * ->active_balance synchronizes accesses to
>>> + * ->active_balance_work. Once set, it's cleared
>>> + * only after active load balance is finished.
>>> + */
>>> + if (!rq->active_balance) {
>>> + rq->active_balance = 1;
>>> + rq->push_cpu = new_cpu;
>>> + } else
>>> + return;
>>> +
>>> + raw_spin_rq_unlock(rq);
>>> + stop_one_cpu_nowait(cpu,
>>> + active_load_balance_cpu_stop, rq,
>>> + &rq->active_balance_work);
>>> + raw_spin_rq_lock(rq);
>>
>> I didn't hit any error, but isn't it eligible to the following ?
>> commit f0498d2a54e7 ("sched: Fix stop_one_cpu_nowait() vs hotplug")
>>
>
> I will recheck but being called from the tick, for the local cpu and
> with a running thread no being cpu_stopper_thread, should protect us
> from the case describe in this commit
>
>>
>>> +}
>>> +
>>> +static inline int has_pushable_tasks(struct rq *rq)
>>> +{
>>> + return !plist_head_empty(&rq->cfs.pushable_tasks);
>>> +}
>>> +
>>> +static struct task_struct *pick_next_pushable_fair_task(struct rq *rq)
>>> +{
>>> + struct task_struct *p;
>>> +
>>> + if (!has_pushable_tasks(rq))
>>> + return NULL;
>>> +
>>> + p = plist_first_entry(&rq->cfs.pushable_tasks,
>>> + struct task_struct, pushable_tasks);
>>> +
>>> + WARN_ON_ONCE(rq->cpu != task_cpu(p));
>>> + WARN_ON_ONCE(task_current(rq, p));
>>> + WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
>>> +
>>> + WARN_ON_ONCE(!task_on_rq_queued(p));
>>> +
>>> + /*
>>> + * Remove task from the pushable list as we try only once after
>>> + * task has been put back in enqueued list.
>>> + */
>>> + plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks);
>>> +
>>> + return p;
>>> +}
>>> +
>>> +/*
>>> + * See if the non running fair tasks on this rq
>>> + * can be sent to some other CPU that fits better with
>>> + * their profile.
>>> + */
>>> +static int push_fair_task(struct rq *rq)
>>> +{
>>> + struct task_struct *next_task;
>>> + struct rq *new_rq;
>>> + int prev_cpu, new_cpu;
>>> + int ret = 0;
>>> +
>>> + next_task = pick_next_pushable_fair_task(rq);
>>> + if (!next_task)
>>> + return 0;
>>> +
>>> + if (is_migration_disabled(next_task))
>>> + return 0;
>>> +
>>> + if (WARN_ON(next_task == rq->curr))
>>> + return 0;
>>> +
>>> + /* We might release rq lock */
>>> + get_task_struct(next_task);
>>> +
>>> + prev_cpu = rq->cpu;
>>> +
>>> + new_cpu = find_energy_efficient_cpu(next_task, prev_cpu);
>>> +
>>> + if (new_cpu == prev_cpu)
>>> + goto out;
>>> +
>>> + new_rq = cpu_rq(new_cpu);
>>> +
>>> + if (double_lock_balance(rq, new_rq)) {
>>
>>
>> I think it might be necessary to check the following:
>> if (task_cpu(next_task) != rq->cpu) {
>> double_unlock_balance(rq, new_rq);
>
> Yes good point
>
>> goto out;
>> }
>>
>> Indeed I've been hitting the following warnings:
>> - uclamp_rq_dec_id():SCHED_WARN_ON(!bucket->tasks)
>> - set_task_cpu()::WARN_ON_ONCE(state == TASK_RUNNING &&
>> p->sched_class == &fair_sched_class &&
>> (p->on_rq && !task_on_rq_migrating(p)))
>> - update_entity_lag()::SCHED_WARN_ON(!se->on_rq)
>>
>> and it seemed to be caused by the task not being on the initial rq anymore.
>
> Do you have a particular use case to trigger this ? I haven't faced
> this in the various stress tests that I did
It was triggered by this workload, but it was on a Pixel6 device,
so there might be more background activity:
- 8 tasks with: [UCLAMP_MIN:0, UCLAMP_MAX:1, duty_cycle:100%, period:16ms]
- 4 tasks each bound to a little CPU with: [UCLAMP_MIN:0, UCLAMP_MAX:1024, duty_cycle:4%, period:4ms]
Regards,
Pierre
Powered by blists - more mailing lists