lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y1j170UQc5DJPYgR@hirez.programming.kicks-ass.net>
Date:   Wed, 26 Oct 2022 10:55:11 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ricardo Neri <ricardo.neri@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Valentin Schneider <vschneid@...hat.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, "Tim C . Chen" <tim.c.chen@...el.com>
Subject: Re: [RFC PATCH 08/23] sched/fair: Compute task-class performance
 scores for load balancing

On Tue, Oct 25, 2022 at 08:57:24PM -0700, Ricardo Neri wrote:

> Do you want me to add your Signed-off-by and Co-developed-by tags?

Nah; who cares ;-)


> > @@ -8749,32 +8747,18 @@ static void compute_ilb_sg_task_class_sc
> >  	if (!busy_cpus)
> >  		return;
> >  
> > -	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
> > -						     dst_cpu);
> > +	score_on_dst_cpu = arch_get_task_class_score(sgcs->min_class, dst_cpu);
> >  
> > -	/*
> > -	 * The simpest case. The single busy CPU in the current group will
> > -	 * become idle after pulling its current task. The destination CPU is
> > -	 * idle.
> > -	 */
> > -	if (busy_cpus == 1) {
> > -		sgs->task_class_score_before = class_sgs->sum_score;
> > -		sgs->task_class_score_after = score_on_dst_cpu;
> > -		return;
> > -	}
> > +	before = sgcs->sum_score
> > +	after  = before - sgcs->min_score + score_on_dst_cpu;
> 
> This works when the sched group being evaluated has only one busy CPU
> because it will become idle if the destination CPU (which was idle) pulls
> the current task.
> 
> >  
> > -	/*
> > -	 * Now compute the group score with and without the task with the
> > -	 * lowest score. We assume that the tasks that remain in the group share
> > -	 * the CPU resources equally.
> > -	 */
> > -	group_score = class_sgs->sum_score / busy_cpus;
> > -
> > -	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
> > -			       (busy_cpus - 1);
> > +	if (busy_cpus > 1) {
> > +		before /= busy_cpus;
> > +		after  /= busy_cpus;
> 
> 
> However, I don't think this works when the sched group has more than one
> busy CPU. 'before' and 'after' reflect the total throughput score of both
> the sched group *and* the destination CPU.
> 
> One of the CPUs in the sched group will become idle after the balance.
> 
> Also, at this point we have already added score_on_dst_cpu. We are incorrectly
> scaling it by the number of busy CPUs in the sched group.
> 
> We instead must scale 'after' by busy_cpus - 1 and then add score_on_dst_cpu.

So none of that makes sense.

'x/n + y' != '(x+y)/(n+1)'

IOW:

> > +	}
> >  
> > -	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
> > -	sgs->task_class_score_before = group_score;
> > +	sgs->task_class_score_before = before;
> > +	sgs->task_class_score_after  = after;

your task_class_score_after is a sum value for 2 cpus worth, not a value
for a single cpu, while your task_class_score_before is a single cpu
average. You can't compare these numbers and have a sensible outcome.

If you have a number of values: x_1....x_n, their average is
Sum(x_1...x_n) / n, which is a single value again.

If you want to update one of the x's, say x_i->x'_i, then the average
changes like:

	Sum(x_1...x_n-x_i+x'_i) / n

If you want to remove one of the x's, then you get:

	Sum(x_1...x_n-x_i) / (n-1) ; 0<i<n

if you want to add an x:

	Sum(x_1...x_n+i_i) / (n+1) ; i>n

Nowhere would you ever get:

	Sum(x_1...x_n) / n + x_i

That's just straight up nonsense.

So I might buy an argument for:

	if (busy_cpus > 1) {
		before /= (busy_cpus-1);
		after  /= (busy_cpus+1);
	}

Or something along those lines (where you remove an entry from before
and add it to after), but not this.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ