linux-kernel - Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D80F1584-F569-4CEE-8DCC-7841CF7E159F@tencent.com>
Date:   Thu, 20 Aug 2020 11:27:59 +0000
From:   benbjiang(蒋彪) <benbjiang@...cent.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
CC:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Jiang Biao <benbjiang@...il.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "bsegall@...gle.com" <bsegall@...gle.com>,
        "mgorman@...e.de" <mgorman@...e.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/fair: reduce preemption with IDLE tasks
 runable(Internet mail)



> On Aug 20, 2020, at 3:35 PM, Vincent Guittot <vincent.guittot@...aro.org> wrote:
> 
> On Thu, 20 Aug 2020 at 02:13, benbjiang(蒋彪) <benbjiang@...cent.com> wrote:
>> 
>> 
>> 
>>> On Aug 19, 2020, at 10:55 PM, Vincent Guittot <vincent.guittot@...aro.org> wrote:
>>> 
>>> On Wed, 19 Aug 2020 at 16:27, benbjiang(蒋彪) <benbjiang@...cent.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 19, 2020, at 7:55 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>> 
>>>>> On 19/08/2020 13:05, Vincent Guittot wrote:
>>>>>> On Wed, 19 Aug 2020 at 12:46, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>>>> 
>>>>>>> On 17/08/2020 14:05, benbjiang(蒋彪) wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 17, 2020, at 4:57 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>>>>>> 
>>>>>>>>> On 14/08/2020 01:55, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> On Aug 13, 2020, at 2:39 AM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 12/08/2020 05:19, benbjiang(蒋彪) wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>>>>>>>>>>>>>> From: Jiang Biao <benbjiang@...cent.com>
>>>>>>> 
>>>>>>> [...]
>>>>>>> 
>>>>>>>>> Are you sure about this?
>>>>>>>> Yes. :)
>>>>>>>>> 
>>>>>>>>> The math is telling me for the:
>>>>>>>>> 
>>>>>>>>> idle task:      (3 / (1024 + 1024 + 3))^(-1) * 4ms = 2735ms
>>>>>>>>> 
>>>>>>>>> normal task: (1024 / (1024 + 1024 + 3))^(-1) * 4ms =    8ms
>>>>>>>>> 
>>>>>>>>> (4ms - 250 Hz)
>>>>>>>> My tick is 1ms - 1000HZ, which seems reasonable for 600ms? :)
>>>>>>> 
>>>>>>> OK, I see.
>>>>>>> 
>>>>>>> But here the different sched slices (check_preempt_tick()->
>>>>>>> sched_slice()) between normal tasks and the idle task play a role to.
>>>>>>> 
>>>>>>> Normal tasks get ~3ms whereas the idle task gets <0.01ms.
>>>>>> 
>>>>>> In fact that depends on the number of CPUs on the system
>>>>>> :sysctl_sched_latency = 6ms * (1 + ilog(ncpus)) . On a 8 cores system,
>>>>>> normal task will run around 12ms in one shoot and the idle task still
>>>>>> one tick period
>>>>> 
>>>>> True. This is on a single CPU.
>>>> Agree. :)
>>>> 
>>>>> 
>>>>>> Also, you can increase even more the period between 2 runs of idle
>>>>>> task by using cgroups and min shares value : 2
>>>>> 
>>>>> Ah yes, maybe this is what Jiang wants to do then? If his runtime does
>>>>> not have other requirements preventing this.
>>>> That could work for increasing the period between 2 runs. But could not
>>>> reduce the single runtime of idle task I guess, which means normal task
>>>> could have 1-tick schedule latency because of idle task.
>>> 
>>> Yes.  An idle task will preempt an always running task during 1 tick
>>> every 680ms. But also you should keep in mind that a waking normal
>>> task will preempt the idle task immediately which means that it will
>>> not add scheduling latency to a normal task but "steal" 0.14% of
>>> normal task throughput (1/680) at most
>> That’s true. But in the VM case, when VM are busy(MWAIT passthrough
>> or running cpu eating works), the 1-tick scheduling latency could be
>> detected by cyclictest running in the VM.
>> 
>> OTOH, we compensate vruntime in place_entity() to boot waking
>> without distinguish SCHED_IDLE task, do you think it’s necessary to
>> do that? like
>> 
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>                vruntime += sched_vslice(cfs_rq, se);
>> 
>>        /* sleeps up to a single latency don't count. */
>> -       if (!initial) {
>> +       if (!initial && likely(!task_has_idle_policy(task_of(se)))) {
>>                unsigned long thresh = sysctl_sched_latency;
> 
> Yeah, this is a good improvement.
Thanks, I’ll send a patch for that. :)

> Does it solve your problem ?
> 
Not exactly. :)  I wonder if we can make SCHED_IDLE more pure(harmless)?
Or introduce a switch(or flag) to control it, and make it available for cases like us.

Thanks a lot.
Regards,
Jiang

>> 
>>> 
>>>> OTOH, cgroups(shares) could introduce extra complexity. :)
>>>> 
>>>> I wonder if there’s any possibility to make SCHED_IDLEs’ priorities absolutely
>>>> lower than SCHED_NORMAL(OTHER), which means no weights/shares
>>>> for them, and they run only when no other task’s runnable.
>>>> I guess there may be priority inversion issue if we do that. But maybe we
>>> 
>>> Exactly, that's why we must ensure a minimum running time for sched_idle task
>> 
>> Still for VM case, different VMs have been much isolated from each other,
>> priority inversion issue could be very rare, we’re trying to make offline tasks
>> absoultly harmless to online tasks. :)
>> 
>> Thanks a lot for your time.
>> Regards,
>> Jiang
>> 
>>> 
>>>> could avoid it by load-balance more aggressively, or it(priority inversion)
>>>> could be ignored in some special case.