[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a903d0dc-1d88-4ae7-ac81-3eed0445654d@linux.alibaba.com>
Date: Thu, 14 Nov 2024 14:36:54 +0800
From: Tianchen Ding <dtcccc@...ux.alibaba.com>
To: 解 咏梅 <xieym_ict@...mail.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched/eevdf: Force propagating min_slice of cfs_rq
when a task changing slice
On 2024/11/14 14:06, 解 咏梅 wrote:
> Let analyze it case by case:P
>
> say cgroup A has 3 tasks: task A, task B, task C
>
> 1) assign taskA's slice to 0.1 ms, task B, tack C, task C all have the default slice (0.75ms)
>
> 2) task A is picked by __schedule as next task, because task A is still on rq,
> so the cfs_rq hierarchical doesn't have to change cfs_rq's min_slice, it will report it to the root cgroup
>
> 3) task A is preempted by other task, it's still runnable. it will be requeued cgroup A's cfs_rq. similar as case 2
>
> 4) task A is preempted since it's blocked, task A's se will be retained in cgroup A's cfs_rq until it reach 0-lag state.
> 4.1 before 0-lag, I guess it's similar as case 2
> the logic is based on cfs_rq's avg_runtime, it supposed task A won't be pick as next task before it reach 0-lag state.
> If my understanding is wrong, pls correct me. Thanks.
> 4.2 After it reached 0-lag state, If it's picked by pick_task_fair, it will be removed from cgroup A cfs_rq ultimately.
> pick_next_entity->dequeue_entities(DEQUEUE_SLEEP | DEQUEUE_DELAYED)->__dequeue_entity (taskA)
> so, cgroup A's cfs_rq min_slice will be re-calculated. So the cfs_rq hierarchical will modify their own min_slice bottom up.
> 4.3 After it reached 0-lag state, it will waked up. Because, the current __schedule() split the path for block/sleep from migration path. only migration path will call deactivate. so p->on_rq is still 1, ttwu_runnable will work for it to just call requeue_delayed_entity. similar as case 2
>
> I think only case 1 has such problem.
>
> Regards,
> Yongmei.
>
I think you misunderstood the case. We're not talking about the DELAY_DEQUEUE
feature. We're simply talking about enqueue(waking up) and dequeue(sleeping).
For convenience, let's turn DELAY_DEQUEUE off.
Consider the following cgroup hierarchy on one cpu:
root_cgroup
|
------------------------
| |
cgroup_A(curr) other_cgroups...
|
--------------
| |
any_se(curr) cgroup_B(runnable)
|
------------
| |
task_A(sleep) task_B(runnable)
Assume task_A has a smaller slice(0.1ms) and all other tasks have default
slice(0.75ms).
Because task_A is sleeping, it is not actually on the tree.
Now task_A is woken up. It is enqueued to cgroup_B. So slice of cgroup_B is
updated to 0.1ms. This is ok.
However, Since cgroup_B is already on_rq, it cannot be "enqueued" again to
cgroup_A. The code is running to the bottom half.(the second
for_each_sched_entity loop in enqueue_task_fair)
So the slice of cgroup_A is not updated. It is still 0.75ms.
Thanks.
Powered by blists - more mailing lists