linux-kernel - Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5f870781-1648-b4ac-6026-557dfc347109@arm.com>
Date:   Mon, 10 Aug 2020 15:24:24 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     benbjiang(蒋彪) <benbjiang@...cent.com>
Cc:     Jiang Biao <benbjiang@...il.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "bsegall@...gle.com" <bsegall@...gle.com>,
        "mgorman@...e.de" <mgorman@...e.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/fair: reduce preemption with IDLE tasks
 runable(Internet mail)

On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
> Hi, 
> 
>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>
>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>
>>>
>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>
>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>> From: Jiang Biao <benbjiang@...cent.com>

[...]

>> How would you deal with se's representing taskgroups which contain
>> SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
> I’m not sure I get the point. :) How about the following patch,
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fa8dbcfa4d..8715f03ed6d7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>                 list_add(&se->group_node, &rq->cfs_tasks);
>         }
>  #endif
> +       if (task_has_idle_policy(task_of(se)))
> +               cfs_rq->idle_nr_running++;
> +
>         cfs_rq->nr_running++;
>  }
> 
> @@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>                 list_del_init(&se->group_node);
>         }
>  #endif
> +       if (task_has_idle_policy(task_of(se)))
> +               cfs_rq->idle_nr_running--;
> +
>         cfs_rq->nr_running--;
>  }
> 
> @@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>                 return;
>  #endif
> 
> -       if (cfs_rq->nr_running > 1)
> +       if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 &&
> +           cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > cfs_rq->idle_nr_running + 1)
>                 check_preempt_tick(cfs_rq, curr);
>  }
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 877fb08eb1b0..401090393e09 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -500,6 +500,7 @@ struct cfs_bandwidth { };
>  struct cfs_rq {
>         struct load_weight      load;
>         unsigned int            nr_running;
> +       unsigned int            idle_nr_running;
>         unsigned int            h_nr_running;      /* SCHED_{NORMAL,BATCH,IDLE} */
>         unsigned int            idle_h_nr_running; /* SCHED_IDLE */

         /
       / |  \
      A  n0 i0
     / \
    n1 i1

I don't think this will work. E.g. the patch would prevent tick
preemption between 'A' and 'n0' on '/' as well

(3 > 1 + 1) && (4 - 2 > 1 + 1)

You also have to make sure that a SCHED_IDLE task can tick preempt
another SCHED_IDLE task.

>>> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
>>> pure that could tolerate starving.
>>
>> Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
>> (policy)?
> The case is that we need tasks(low priority, called offline tasks) to utilize the
> spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without
> interfering the online tasks. 
> Offline tasks only run when there’s no runnable online tasks, and offline tasks
> never preempt online tasks.
> The SCHED_IDLE policy seems not to be abled to be qualified for that requirement,
> because it has a weight(3), even though it’s small, but it can still preempt online
> tasks considering the fairness. In that way, offline tasks of SCHED_IDLE policy
> could interfere the online tasks.

Because of this very small weight (weight=3), compared to a SCHED_NORMAL
nice 0 task (weight=1024), a SCHED_IDLE task is penalized by a huge
se->vruntime value (1024/3 higher than for a SCHED_NORMAL nice 0 task).
This should make sure it doesn't tick preempt a SCHED_NORMAL nice 0 task.

It's different when the SCHED_NORMAL task has nice 19 (weight=15) but
that's part of the CFS design.

> On the other hand, idle_sched_class seems not to be qualified either. It’s too
> simple and only used for per-cpu idle task currently.

Yeah, leave this for the rq->idle task (swapper/X).

>>> We need an absolutely low priority class that could tolerate starving, which
>>> could be used to co-locate offline tasks. But IDLE class seems to be not
>>> *low* enough, if considering the fairness of CFS, and IDLE class still has a
>>> weight.

Understood. But this (tick) preemption should happen extremely rarely,
especially if you have SCHED_NORMAL nice 0 tasks, right?