linux-kernel - Re: [RFC PATCH 09/13] sched/fair: core wide vruntime comparison

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200414135624.GU20730@hirez.programming.kicks-ass.net>
Date:   Tue, 14 Apr 2020 15:56:24 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     vpillai <vpillai@...italocean.com>
Cc:     Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org,
        Aaron Lu <aaron.lu@...ux.alibaba.com>,
        linux-kernel@...r.kernel.org, fweisbec@...il.com,
        keescook@...omium.org, kerrnel@...gle.com,
        Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>, aubrey.li@...ux.intel.com,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Joel Fernandes <joelaf@...gle.com>, joel@...lfernandes.org,
        Aaron Lu <ziqian.lzq@...fin.com>
Subject: Re: [RFC PATCH 09/13] sched/fair: core wide vruntime comparison

On Wed, Mar 04, 2020 at 04:59:59PM +0000, vpillai wrote:
> From: Aaron Lu <aaron.lu@...ux.alibaba.com>
> 
> This patch provides a vruntime based way to compare two cfs task's
> priority, be it on the same cpu or different threads of the same core.
> 
> When the two tasks are on the same CPU, we just need to find a common
> cfs_rq both sched_entities are on and then do the comparison.
> 
> When the two tasks are on differen threads of the same core, the root
> level sched_entities to which the two tasks belong will be used to do
> the comparison.
> 
> An ugly illustration for the cross CPU case:
> 
>    cpu0         cpu1
>  /   |  \     /   |  \
> se1 se2 se3  se4 se5 se6
>     /  \            /   \
>   se21 se22       se61  se62
> 
> Assume CPU0 and CPU1 are smt siblings and task A's se is se21 while
> task B's se is se61. To compare priority of task A and B, we compare
> priority of se2 and se6. Whose vruntime is smaller, who wins.
> 
> To make this work, the root level se should have a common cfs_rq min
> vuntime, which I call it the core cfs_rq min vruntime.
> 
> When we adjust the min_vruntime of rq->core, we need to propgate
> that down the tree so as to not cause starvation of existing tasks
> based on previous vruntime.

You forgot the time complexity analysis.


> +static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta)
> +{
> +	struct sched_entity *se, *next;
> +
> +	if (!cfs_rq)
> +		return;
> +
> +	cfs_rq->min_vruntime -= delta;
> +	rbtree_postorder_for_each_entry_safe(se, next,
> +			&cfs_rq->tasks_timeline.rb_root, run_node) {

Which per this ^

> +		if (se->vruntime > delta)
> +			se->vruntime -= delta;
> +		if (se->my_q)
> +			coresched_adjust_vruntime(se->my_q, delta);
> +	}
> +}

> @@ -511,6 +607,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
>  
>  	/* ensure we never gain time by being placed backwards. */
>  	cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
> +	update_core_cfs_rq_min_vruntime(cfs_rq);
>  #ifndef CONFIG_64BIT
>  	smp_wmb();
>  	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

as called from here, is exceedingly important.

Worse, I don't think our post-order iteration is even O(n).


All of this is exceedingly yuck.