linux-kernel - Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CCA1D942-3669-4216-92BD-768967B1ECE5@tencent.com>
Date:   Tue, 11 Aug 2020 00:41:40 +0000
From:   benbjiang(蒋彪) <benbjiang@...cent.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
CC:     Jiang Biao <benbjiang@...il.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "bsegall@...gle.com" <bsegall@...gle.com>,
        "mgorman@...e.de" <mgorman@...e.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/fair: reduce preemption with IDLE tasks
 runable(Internet mail)

Hi,

> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> 
> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>> Hi, 
>> 
>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>> 
>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>> 
>>>> 
>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>> 
>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>> From: Jiang Biao <benbjiang@...cent.com>
> 
> [...]
> 
>>> How would you deal with se's representing taskgroups which contain
>>> SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
>> I’m not sure I get the point. :) How about the following patch,
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..8715f03ed6d7 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>                list_add(&se->group_node, &rq->cfs_tasks);
>>        }
>> #endif
>> +       if (task_has_idle_policy(task_of(se)))
>> +               cfs_rq->idle_nr_running++;
>> +
>>        cfs_rq->nr_running++;
>> }
>> 
>> @@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>                list_del_init(&se->group_node);
>>        }
>> #endif
>> +       if (task_has_idle_policy(task_of(se)))
>> +               cfs_rq->idle_nr_running--;
>> +
>>        cfs_rq->nr_running--;
>> }
>> 
>> @@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>                return;
>> #endif
>> 
>> -       if (cfs_rq->nr_running > 1)
>> +       if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 &&
>> +           cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > cfs_rq->idle_nr_running + 1)
>>                check_preempt_tick(cfs_rq, curr);
>> }
>> 
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 877fb08eb1b0..401090393e09 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -500,6 +500,7 @@ struct cfs_bandwidth { };
>> struct cfs_rq {
>>        struct load_weight      load;
>>        unsigned int            nr_running;
>> +       unsigned int            idle_nr_running;
>>        unsigned int            h_nr_running;      /* SCHED_{NORMAL,BATCH,IDLE} */
>>        unsigned int            idle_h_nr_running; /* SCHED_IDLE */
> 
>         /
>       / |  \
>      A  n0 i0
>     / \
>    n1 i1
> 
> I don't think this will work. E.g. the patch would prevent tick
> preemption between 'A' and 'n0' on '/' as well
> 
> (3 > 1 + 1) && (4 - 2 > 1 + 1)
> 
> You also have to make sure that a SCHED_IDLE task can tick preempt
> another SCHED_IDLE task.

That’s right. :)

> 
>>>> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
>>>> pure that could tolerate starving.
>>> 
>>> Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
>>> (policy)?
>> The case is that we need tasks(low priority, called offline tasks) to utilize the
>> spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without
>> interfering the online tasks. 
>> Offline tasks only run when there’s no runnable online tasks, and offline tasks
>> never preempt online tasks.
>> The SCHED_IDLE policy seems not to be abled to be qualified for that requirement,
>> because it has a weight(3), even though it’s small, but it can still preempt online
>> tasks considering the fairness. In that way, offline tasks of SCHED_IDLE policy
>> could interfere the online tasks.
> 
> Because of this very small weight (weight=3), compared to a SCHED_NORMAL
> nice 0 task (weight=1024), a SCHED_IDLE task is penalized by a huge
> se->vruntime value (1024/3 higher than for a SCHED_NORMAL nice 0 task).
> This should make sure it doesn't tick preempt a SCHED_NORMAL nice 0 task.
Could you please explain how the huge penalization of vruntime(1024/3) could
make sure SCHED_IDLE not tick preempting SCHED_NORMAL nice 0 task?

Thanks a lot.

Regards,
Jiang

> 
> It's different when the SCHED_NORMAL task has nice 19 (weight=15) but
> that's part of the CFS design.
> 
>> On the other hand, idle_sched_class seems not to be qualified either. It’s too
>> simple and only used for per-cpu idle task currently.
> 
> Yeah, leave this for the rq->idle task (swapper/X).
Got it.

> 
>>>> We need an absolutely low priority class that could tolerate starving, which
>>>> could be used to co-locate offline tasks. But IDLE class seems to be not
>>>> *low* enough, if considering the fairness of CFS, and IDLE class still has a
>>>> weight.
> 
> Understood. But this (tick) preemption should happen extremely rarely,
> especially if you have SCHED_NORMAL nice 0 tasks, right?