lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZaPC7o44lEswxOXp@vingu-book>
Date: Sun, 14 Jan 2024 12:18:06 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Wyes Karny <wkarny@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...nel.org>,
	linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Daniel Bristot de Oliveira <bristot@...hat.com>,
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [GIT PULL] Scheduler changes for v6.8

Hi Wyes,

Le dimanche 14 janv. 2024 à 14:42:40 (+0530), Wyes Karny a écrit :
> On Wed, Jan 10, 2024 at 02:57:14PM -0800, Linus Torvalds wrote:
> > On Wed, 10 Jan 2024 at 14:41, Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> > >
> > > It's one of these two:
> > >
> > >   f12560779f9d sched/cpufreq: Rework iowait boost
> > >   9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation
> > >
> > > one more boot to go, then I'll try to revert whichever causes my
> > > machine to perform horribly much worse.
> > 
> > I guess it should come as no surprise that the result is
> > 
> >    9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d is the first bad commit
> > 
> > but to revert cleanly I will have to revert all of
> > 
> >       b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> >       f12560779f9d ("sched/cpufreq: Rework iowait boost")
> >       9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> > performance estimation")
> > 
> > This is on a 32-core (64-thread) AMD Ryzen Threadripper 3970X, fwiw.
> > 
> > I'll keep that revert in my private test-tree for now (so that I have
> > a working machine again), but I'll move it to my main branch soon
> > unless somebody has a quick fix for this problem.
> 
> Hi Linus,
> 
> I'm able to reproduce this issue with my AMD Ryzen 5600G system.  But
> only if I disable CPPC in BIOS and boot with acpi-cpufreq + schedutil.
> (I believe for your case also CPPC is diabled as log "_CPC object is not
> present" came). Enabling CPPC in BIOS issue not seen in my system.  For
> AMD acpi-cpufreq also uses _CPC object to determine the boost ratio.
> When CPPC is disabled in BIOS something is going wrong and max
> capacity is becoming zero.
> 
> Hi Vincent, Qais,
> 
> I have collected some data with bpftracing:

Thanks for your tests results

> 
> sudo bpftrace -e 'kretprobe:effective_cpu_util /cpu == 1/ { @eff_util = lhist(retval, 0, 1200, 50);} kprobe:get_next_freq /cpu == 1/ { @sugov_eff_util = lhist(arg1, 0, 1200, 50); @sugov_max_cap = lhist(arg2, 0, 1000, 2);} kretprobe:get_next_freq /cpu == 1/ { @sugov_freq = lhist(retval, 1000000, 5000000, 100000);}'
> 
> with running: taskset -c 1 make
> 
> issue case:
> 
> Attaching 3 probes...
> @eff_util:
> [0, 50)             1263 |@                                                   |
> [50, 100)            517 |                                                    |
> [100, 150)           233 |                                                    |
> [150, 200)           297 |                                                    |
> [200, 250)           162 |                                                    |
> [250, 300)            98 |                                                    |
> [300, 350)            75 |                                                    |
> [350, 400)           205 |                                                    |
> [400, 450)           210 |                                                    |
> [450, 500)            16 |                                                    |
> [500, 550)          1532 |@                                                   |
> [550, 600)          1026 |                                                    |
> [600, 650)           761 |                                                    |
> [650, 700)           876 |                                                    |
> [700, 750)          1085 |                                                    |
> [750, 800)           891 |                                                    |
> [800, 850)           816 |                                                    |
> [850, 900)           983 |                                                    |
> [900, 950)           661 |                                                    |
> [950, 1000)          759 |                                                    |
> [1000, 1050)       57433 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 

ok so the output of effective_cpu_util() seems correct or at least to maw utilization
value. In order to be correct, it means that arch_scale_cpu_capacity(cpu) is not zero
because of :

unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
				 unsigned long *min,
				 unsigned long *max)
{
	unsigned long util, irq, scale;
	struct rq *rq = cpu_rq(cpu);

	scale = arch_scale_cpu_capacity(cpu);

	/*
	 * Early check to see if IRQ/steal time saturates the CPU, can be
	 * because of inaccuracies in how we track these -- see
	 * update_irq_load_avg().
	 */
	irq = cpu_util_irq(rq);
	if (unlikely(irq >= scale)) {
		if (min)
			*min = scale;
		if (max)
			*max = scale;
		return scale;
	}
..
}

If arch_scale_cpu_capacity(cpu) returns 0 then effective_cpu_util() should returns
0 too.

Now see below

> @sugov_eff_util:
> [0, 50)             1074 |                                                    |
> [50, 100)            571 |                                                    |
> [100, 150)           259 |                                                    |
> [150, 200)           169 |                                                    |
> [200, 250)           237 |                                                    |
> [250, 300)           156 |                                                    |
> [300, 350)            91 |                                                    |
> [350, 400)            46 |                                                    |
> [400, 450)            52 |                                                    |
> [450, 500)           195 |                                                    |
> [500, 550)           175 |                                                    |
> [550, 600)            46 |                                                    |
> [600, 650)           493 |                                                    |
> [650, 700)          1424 |@                                                   |
> [700, 750)           646 |                                                    |
> [750, 800)           628 |                                                    |
> [800, 850)           612 |                                                    |
> [850, 900)           840 |                                                    |
> [900, 950)           893 |                                                    |
> [950, 1000)          640 |                                                    |
> [1000, 1050)       60679 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_freq:
> [1400000, 1500000)   69911 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_max_cap:
> [0, 2)             69926 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

In get_next_freq(struct sugov_policy *sg_policy, unsigned long util, unsigned long max)

max is 0 and we comes from this path:

static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
				     unsigned int flags)
{

..
	max_cap = arch_scale_cpu_capacity(sg_cpu->cpu);

	if (!sugov_update_single_common(sg_cpu, time, max_cap, flags))
		return;

	next_f = get_next_freq(sg_policy, sg_cpu->util, max_cap);
..

so here arch_scale_cpu_capacity(sg_cpu->cpu) returns 0 ...

AFAICT, AMD platform uses the default 
static __always_inline
unsigned long arch_scale_cpu_capacity(int cpu)
{
	return SCHED_CAPACITY_SCALE;
}

I'm missing something here

> 
> 
> good case:
> 
> Attaching 3 probes...
> @eff_util:
> [0, 50)              246 |@                                                   |
> [50, 100)            150 |@                                                   |
> [100, 150)           191 |@                                                   |
> [150, 200)           239 |@                                                   |
> [200, 250)           117 |                                                    |
> [250, 300)          2101 |@@@@@@@@@@@@@@@                                     |
> [300, 350)          2284 |@@@@@@@@@@@@@@@@                                    |
> [350, 400)           713 |@@@@@                                               |
> [400, 450)           151 |@                                                   |
> [450, 500)           154 |@                                                   |
> [500, 550)          1121 |@@@@@@@@                                            |
> [550, 600)          1901 |@@@@@@@@@@@@@                                       |
> [600, 650)          1208 |@@@@@@@@                                            |
> [650, 700)           606 |@@@@                                                |
> [700, 750)           557 |@@@                                                 |
> [750, 800)           872 |@@@@@@                                              |
> [800, 850)          1092 |@@@@@@@                                             |
> [850, 900)          1416 |@@@@@@@@@@                                          |
> [900, 950)          1107 |@@@@@@@                                             |
> [950, 1000)         1051 |@@@@@@@                                             |
> [1000, 1050)        7260 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_eff_util:
> [0, 50)              241 |                                                    |
> [50, 100)            149 |                                                    |
> [100, 150)            72 |                                                    |
> [150, 200)            95 |                                                    |
> [200, 250)            43 |                                                    |
> [250, 300)            49 |                                                    |
> [300, 350)            19 |                                                    |
> [350, 400)            56 |                                                    |
> [400, 450)            22 |                                                    |
> [450, 500)            29 |                                                    |
> [500, 550)          1840 |@@@@@@                                              |
> [550, 600)          1476 |@@@@@                                               |
> [600, 650)          1027 |@@@                                                 |
> [650, 700)           473 |@                                                   |
> [700, 750)           366 |@                                                   |
> [750, 800)           627 |@@                                                  |
> [800, 850)           930 |@@@                                                 |
> [850, 900)          1285 |@@@@                                                |
> [900, 950)           971 |@@@                                                 |
> [950, 1000)          946 |@@@                                                 |
> [1000, 1050)       13839 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_freq:
> [1400000, 1500000)     648 |@                                                   |
> [1500000, 1600000)       0 |                                                    |
> [1600000, 1700000)       0 |                                                    |
> [1700000, 1800000)      25 |                                                    |
> [1800000, 1900000)       0 |                                                    |
> [1900000, 2000000)       0 |                                                    |
> [2000000, 2100000)       0 |                                                    |
> [2100000, 2200000)       0 |                                                    |
> [2200000, 2300000)       0 |                                                    |
> [2300000, 2400000)       0 |                                                    |
> [2400000, 2500000)       0 |                                                    |
> [2500000, 2600000)       0 |                                                    |
> [2600000, 2700000)       0 |                                                    |
> [2700000, 2800000)       0 |                                                    |
> [2800000, 2900000)       0 |                                                    |
> [2900000, 3000000)       0 |                                                    |
> [3000000, 3100000)       0 |                                                    |
> [3100000, 3125K)       0 |                                                    |
> [3125K, 3300000)       0 |                                                    |
> [3300000, 3400000)       0 |                                                    |
> [3400000, 3500000)       0 |                                                    |
> [3500000, 3600000)       0 |                                                    |
> [3600000, 3700000)       0 |                                                    |
> [3700000, 3800000)       0 |                                                    |
> [3800000, 3900000)       0 |                                                    |
> [3900000, 4000000)   23879 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> @sugov_max_cap:
> [0, 2)             24555 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> In both case max_cap is zero but selected freq is incorrect in bad case.

Also we have in get_next_freq():
	freq = map_util_freq(util, freq, max);
	       --> util * freq /max

If max was 0, we should have been an error ?

There is something strange that I don't understand

Could you trace on the return of sugov_get_util()
the value of sg_cpu->util ?

Thanks for you help
Vincent

> 
> Thanks,
> Wyes
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ