lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Jan 2021 09:30:14 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     chin <ultrachin@....com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        heddchen@...cent.com,
        xiaoggchen(陈小光) <xiaoggchen@...cent.com>
Subject: Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks

On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@....com> wrote:
>
>
>
>
> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@...aro.org> wrote:
> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@....com> wrote:
> >>
> >>
> >>
> >>
> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@...aro.org> wrote:
> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@....com> wrote:
> >> >>
> >> >>
> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@...aro.org> wrote:
> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@....com> wrote:
> >> >> >>
> >> >> >> From: Chen Xiaoguang <xiaoggchen@...cent.com>
> >> >> >>
> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >> >> >
> >> >> >Could you explain more in detail why you only care about this use case
> >> >>
> >> >> >in particular and not the general case?
> >> >>
> >> >>
> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> >> >> the same computer in order to use the computer efficiently.
> >> >> The online tasks are in sleep in most times but should responce soon once
> >> >> wake up. The offline tasks are in low priority and will run only when no online
> >> >> tasks.
> >> >>
> >> >> The online tasks are more important than the offline tasks and are latency
> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> >> >> as soon as possilbe while there are online tasks waiting to run.
> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> >> >>
> >> >> Let's assume we have 2 CPUs,
> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> >> >>
> >> >>              CPU1                      CPU2
> >> >>         curr       rq1            curr          rq2
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >>
> >> >>                                  NORMAL exits or blocked
> >> >>       +------+ | +------+                | +----+ +----+
> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >>       +------+ | +------+                | +----+ +----+
> >> >>
> >> >>                                  pick_next_task_fair
> >> >>       +------+ | +------+         +----+ | +----+
> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> >> >>       +------+ | +------+         +----+ | +----+
> >> >>
> >> >>                                  SCHED_IDLE running
> >> >> t3    +------+ | +------+        +----+  | +----+
> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> >> >>       +------+ | +------+        +----+  | +----+
> >> >>
> >> >>                                  run_rebalance_domains
> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ |                +------+ | +----+ +----+
> >> >>
> >> >> As we can see
> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> >> >> another SCHED_NORMAL in rq1 is waiting.
> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> >> >> t4: after a short time, periodic load_balance triggerd and pull
> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> >> >>
> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> >> >>
> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> >> >>
> >> >> This patch works as below:
> >> >>
> >> >>              CPU1                      CPU2
> >> >>         curr       rq1            curr          rq2
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >>
> >> >>                                  NORMAL exits or blocked
> >> >>       +------+ | +------+                | +----+ +----+
> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >>       +------+ | +------+                | +----+ +----+
> >> >>
> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> >> >>
> >> >>                                  newidle_balance
> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >>
> >> >>
> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> >> >> SCHED_IDLE(likely).
> >> >>
> >> >> >
> >> >> >> CPU by doing load_balance first.
> >> >> >>
> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@...cent.com>
> >> >> >> Signed-off-by: Chen He <heddchen@...cent.com>
> >> >> >> ---
> >> >> >>  kernel/sched/fair.c | 5 +++++
> >> >> >>  1 file changed, 5 insertions(+)
> >> >> >>
> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> >> index ae7ceba..0a26132 100644
> >> >> >> --- a/kernel/sched/fair.c
> >> >> >> +++ b/kernel/sched/fair.c
> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >> >> >>         struct task_struct *p;
> >> >> >>         int new_tasks;
> >> >> >>
> >> >> >> +       if (prev &&
> >> >> >> +           fair_policy(prev->policy) &&
> >> >> >
> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> >> >> > to make sure to not return immediately after jumping to the idle
> >> >>
> >> >> >label?
> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> >> >> to SCHED_IDLE.
> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> >> >> and kind of wasting.
> >> >
> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> >> >called when there are only sched_idle tasks is useless.
> >> >But you also have to take into account cases where there was another
> >> >class of task running on the cpu like RT one. In your example above,
> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> >>
> >> >pick the normal task on CPU1 once RT task goes to sleep.
> >> Sure, this case should be taken into account,  we should also try to
> >> pick normal task in this case.
> >>
> >> >
> >> >Another point that you will have to consider the impact on
> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> >>
> >> >going idle which is not the case anymore with your use case
> >> Yes. rq->idle_stamp should not be changed in this case.
> >>
> >>
> >>
> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> >> is useless in our situation.
> >
> >newidle_balance will pull a sched_idle task only if there is an
> >imbalance which is the right thing to do IMO to ensure fairness
> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> >we should break the fairness
> >
> >>
> >> So we plan to add a new function sched_idle_balance which only try to
> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> >> sched_idle_balance when the previous task is normal or RT and
> >> hoping we can pull a SCHED_NORMAL task to run.
> >>
> >> Do you think it is ok to add a new sched_idle_balance?
> >
> >I don't see any reason why the scheduler should not pull a sched_idle
> >task if there is an imbalance. That will happen anyway during the next
>
> >periodic load balance
> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
>
>
> Do you think it make sense to do an extra load_balance when cpu is
> about to run SCHED_IDLE task (switched from normal/RT)?

I'm not sure to get your point here.
Do you mean if a sched_idle task is picked to become the running task
whereas there are runnable normal tasks ? This can happen if normal
tasks are long running tasks. We should not in this case. The only
case is when the running task, which is not a sched_idle task but a
normal/rt/deadline one, goes to sleep and there are only sched_idle
tasks enqueued. In this case and only in this case, we should trigger
a load_balance to get a chance to pull a waiting normal task from
another CPU.

This means checking this state in pick_next_task_fair() and in balance_fair()

> By doing this SCHED_NORMAL tasks waiting on other cpus would get
> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
> of SCHED_NORMAL tasks.
>
>
> >>>
> >> >
> >> >>
> >> >> >
> >> >>
> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
> >> >> You are right, if you think this scenario makes sense, we will send a
> >> >> refined patch soon :-)
> >> >>
> >> >> >
> >> >> >> +           sched_idle_cpu(rq->cpu))
> >> >> >> +               goto idle;
> >> >> >> +
> >> >> >>  again:
> >> >> >>         if (!sched_fair_runnable(rq))
> >> >> >>                 goto idle;
> >> >> >> --
> >> >> >> 1.8.3.1
> >> >> >>
> >> >> >>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ