linux-kernel - Re: [RFC PATCH v3 00/16] Core scheduling v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANaguZCkKQTmgye+9nQhzQqYBrsnCmcjA46TPmLwN60vvMQ_7w@mail.gmail.com>
Date:   Fri, 11 Oct 2019 07:32:48 -0400
From:   Vineeth Remanan Pillai <vpillai@...italocean.com>
To:     Aaron Lu <aaron.lu@...ux.alibaba.com>
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Dario Faggioli <dfaggioli@...e.com>,
        "Li, Aubrey" <aubrey.li@...ux.intel.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Frédéric Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3

> > The reason we need to do this is because, new tasks that gets created will
> > have a vruntime based on the new min_vruntime and old tasks will have it
> > based on the old min_vruntime
>
> I think this is expected behaviour.
>
I don't think this is the expected behavior. If we hadn't changed the root
cfs->min_vruntime for the core rq, then it would have been the expected
behaviour. But now, we are updating the core rq's root cfs, min_vruntime
without changing the the vruntime down to the tree. To explain, consider
this example based on your patch. Let cpu 1 and 2 be siblings. And let rq(cpu1)
be the core rq. Let rq1->cfs->min_vruntime=1000 and rq2->cfs->min_vruntime=2000.
So in update_core_cfs_rq_min_vruntime, you update rq1->cfs->min_vruntime
to 2000 because that is the max. So new tasks enqueued on rq1 starts with
vruntime of 2000 while the tasks in that runqueue are still based on the old
min_vruntime(1000). So the new tasks gets enqueued some where to the right
of the tree and has to wait until already existing tasks catch up the
vruntime to
2000. This is what I meant by starvation. This happens always when we update
the core rq's cfs->min_vruntime. Hope this clarifies.

> > and it can cause starvation based on how
> > you set the min_vruntime.
>
> Care to elaborate the starvation problem?

Explained above.

> Again, what's the point of normalizing sched entities' vruntime in
> sub-cfs_rqs? Their vruntime comparisons only happen inside their own
> cfs_rq, we don't do cross CPU vruntime comparison for them.

As I mentioned above, this is to avoid the starvation case. Even though we are
not doing cross cfs_rq comparison, the whole tree's vruntime is based on the
root cfs->min_vruntime and we will have an imbalance if we change the root
cfs->min_vruntime without updating down the tree.

Thanks,
Vineeth