lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 26 Nov 2020 09:23:52 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     Vineeth Pillai <viremana@...ux.microsoft.com>,
        "Joel Fernandes (Google)" <joel@...lfernandes.org>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>, tglx@...utronix.de,
        linux-kernel@...r.kernel.org, mingo@...nel.org,
        torvalds@...ux-foundation.org, fweisbec@...il.com,
        keescook@...omium.org, kerrnel@...gle.com,
        Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>, vineeth@...byteword.org,
        Chen Yu <yu.c.chen@...el.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Agata Gruza <agata.gruza@...el.com>,
        Antonio Gomez Iglesias <antonio.gomez.iglesias@...el.com>,
        graf@...zon.com, konrad.wilk@...cle.com, dfaggioli@...e.com,
        pjt@...gle.com, rostedt@...dmis.org, derkling@...gle.com,
        benbjiang@...cent.com,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        James.Bottomley@...senpartnership.com, OWeisse@...ch.edu,
        Dhaval Giani <dhaval.giani@...cle.com>,
        Junaid Shahid <junaids@...gle.com>, jsbarnes@...gle.com,
        chris.hyser@...cle.com, Ben Segall <bsegall@...gle.com>,
        Josh Don <joshdon@...gle.com>, Hao Luo <haoluo@...gle.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Tim Chen <tim.c.chen@...el.com>
Subject: Re: [PATCH -tip 09/32] sched/fair: Snapshot the min_vruntime of CPUs
 on force idle

On Thu, Nov 26, 2020 at 10:17:15AM +1100, Balbir Singh wrote:
> On Tue, Nov 24, 2020 at 10:09:55AM +0100, Peter Zijlstra wrote:

> > The basic observation the current approach relies on is that al that
> > faffery basically boils down to the fact that vruntime only means
> > something when there is contention. And that only the progression is
> > important not the actual value. That is, this is all fundamentally a
> > differential equation and our integration constant is meaningless (also
> > embodied in (7)).
> >
> 
> I'll reread (6) and (7), I am trying to understand forced idle and
> contention together, from what I understand of the patches, there is 

When we force-idle there is contention by definition; there's a task
that wanted to run, but couldn't.

> 1. two tasks that are core scheduled, in that case vruntime works as
> expected on each CPU, but we need to compare their combined vrtuntime
> against other tasks on each CPU respectively for them to be
> selected/chosen?

We need to compare across CPUs when the cookies don't match. This is
required to avoid starving one or the other.

> 2. When one of the tasks selected is a part of the core scheduling group
> and the other CPU does not select a core scheduled tasks, we need to ask
> ourselves if that CPU should force idle and that's where this logic
> comes into play?

When one CPU selects a cookie task, and the other CPU cannot find a
matching task, it must go idle (as idle matches everyone). This is the
basic core-scheduling constraint.

So suppose you have two tasks, A and B, both with a cookie, but not
matching.

Normal scheduling would run A and B concurrent on the two siblings. Core
scheduling obviously cannot do this. When we pick A, the other CPU is
not allowed to run B and thus will have to be forced idle and
vice-versa.

The next problem is avoiding starvation. Assuming equal weight between
the tasks, we'd want to end up running A and B in alternating cycles.

This means having to compare runtimes between A and B, but when they're
on different runqueues the actual vruntime values can be wildly
divergent and cannot be reasily compared (the integration constant is
meaningless but really annoying ;-).

We also cannot use min_vruntime (which is the same as the task vruntime
when there is only a single task), because then you cannot observe
progress. The difference between min_vruntime and the task runtime is
always 0, so you can't tell who just ran and who got starved.

This is where our snapshots come in play, we snapshot vruntime after
task selection (before running), such that at the next pick we can tell
who made progress and who got starved.

By marking the vruntime of both runqueues at the same point in time we
basically normalize away that integration constant. You effectively
reset the vruntime to 0 (through (7), but without iterating all the
tasks and adjusting it).

Does that make sense?

Once you get this, read that second email linked.

Powered by blists - more mailing lists