[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200414135624.GU20730@hirez.programming.kicks-ass.net>
Date: Tue, 14 Apr 2020 15:56:24 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: vpillai <vpillai@...italocean.com>
Cc: Nishanth Aravamudan <naravamudan@...italocean.com>,
Julien Desfossez <jdesfossez@...italocean.com>,
Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org,
Aaron Lu <aaron.lu@...ux.alibaba.com>,
linux-kernel@...r.kernel.org, fweisbec@...il.com,
keescook@...omium.org, kerrnel@...gle.com,
Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lwe@...il.com>,
Aubrey Li <aubrey.intel@...il.com>, aubrey.li@...ux.intel.com,
Valentin Schneider <valentin.schneider@....com>,
Mel Gorman <mgorman@...hsingularity.net>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Joel Fernandes <joelaf@...gle.com>, joel@...lfernandes.org,
Aaron Lu <ziqian.lzq@...fin.com>
Subject: Re: [RFC PATCH 09/13] sched/fair: core wide vruntime comparison
On Wed, Mar 04, 2020 at 04:59:59PM +0000, vpillai wrote:
> From: Aaron Lu <aaron.lu@...ux.alibaba.com>
>
> This patch provides a vruntime based way to compare two cfs task's
> priority, be it on the same cpu or different threads of the same core.
>
> When the two tasks are on the same CPU, we just need to find a common
> cfs_rq both sched_entities are on and then do the comparison.
>
> When the two tasks are on differen threads of the same core, the root
> level sched_entities to which the two tasks belong will be used to do
> the comparison.
>
> An ugly illustration for the cross CPU case:
>
> cpu0 cpu1
> / | \ / | \
> se1 se2 se3 se4 se5 se6
> / \ / \
> se21 se22 se61 se62
>
> Assume CPU0 and CPU1 are smt siblings and task A's se is se21 while
> task B's se is se61. To compare priority of task A and B, we compare
> priority of se2 and se6. Whose vruntime is smaller, who wins.
>
> To make this work, the root level se should have a common cfs_rq min
> vuntime, which I call it the core cfs_rq min vruntime.
>
> When we adjust the min_vruntime of rq->core, we need to propgate
> that down the tree so as to not cause starvation of existing tasks
> based on previous vruntime.
You forgot the time complexity analysis.
> +static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta)
> +{
> + struct sched_entity *se, *next;
> +
> + if (!cfs_rq)
> + return;
> +
> + cfs_rq->min_vruntime -= delta;
> + rbtree_postorder_for_each_entry_safe(se, next,
> + &cfs_rq->tasks_timeline.rb_root, run_node) {
Which per this ^
> + if (se->vruntime > delta)
> + se->vruntime -= delta;
> + if (se->my_q)
> + coresched_adjust_vruntime(se->my_q, delta);
> + }
> +}
> @@ -511,6 +607,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
>
> /* ensure we never gain time by being placed backwards. */
> cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
> + update_core_cfs_rq_min_vruntime(cfs_rq);
> #ifndef CONFIG_64BIT
> smp_wmb();
> cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
as called from here, is exceedingly important.
Worse, I don't think our post-order iteration is even O(n).
All of this is exceedingly yuck.
Powered by blists - more mailing lists