linux-kernel - Re: CFS flat runqueue proposal fixes/update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0e7a9174-6ed9-752d-dacb-4dce182852cf@arm.com>
Date:   Thu, 20 Aug 2020 16:56:17 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Rik van Riel <riel@...riel.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Paul Turner <pjt@...gle.com>,
        "vincent.guittot" <vincent.guittot@...aro.org>, kernel-team@...com,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dietmar.eggeman" <dietmar.eggeman@....com>
Subject: Re: CFS flat runqueue proposal fixes/update

Hi Rik,

On 31/07/2020 09:42, Rik van Riel wrote:

[...]

> Lets revisit the hierarchy from above, and assign priorities
> to the cgroups, with the fixed point one being 1000. Lets
> say cgroups A, A1, and B have priority 1000, while cgroup
> A2 has priority 1.
> 
>         /\
>        /  \
>       A    B
>      / \    \ 
>     A1 A2   t3
>    /     \
>   t1     t2
> 
> One consequence of this is that when t1, t2, and t3 each
> get a time slice, the vruntime of tasks t1 and t3 advances
> at roughly the same speed as the clock time, while the
> vruntime of task t2 advances 1000x faster.
> 
> This is fine if all three tasks continue to be runnable,
> since t1, t2 and t3 each get their fair share of CPU time.
> 
> However, if t1 goes to sleep, t2 is the only thing running
> inside cgroup A, which has the same priority as cgroup B,
> and tasks t2 and t3 should be getting the same amount of
> CPU time.
> 
> They eventually will, but not before task t3 has used up
> enough CPU time to catch up with the enormous vruntime
> advance that t2 just suffered.
> 
> That needs to be fixed, to get near-immediate convergence,
> and not convergence after some unknown (potentially long)
> period of time.

I'm trying to understand this issue in detail ...

Since t1 and t2 are single tasks in A1 and A2, this taskgroup level
shouldn't matter for tick preemption after t1 went to sleep?

check_preempt_tick() is only invoked for 'cfs_rq->nr_running > 1' from
entity_tick().

IMHO, tick preemption is handled between A and B and since they have the
same cpu.weight (cpu.shares) t2 and t3 get the same time slice after t1
went to sleep.

I think that here tick preemption happens in the 'if (delta_exec >
ideal_runtime)' condition w/ delta_exec = curr->sum_exec_runtime -
curr->prev_sum_exec_runtime.

Did I miss anything?

[...]