lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 11 Dec 2013 12:28:36 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Mike Galbraith <bitbucket@...ine.de>
cc:	Len Brown <lenb@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Linux PM list <linux-pm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Jeremy Eder <jeder@...hat.com>, x86@...nel.org,
	Borislav Petkov <bp@...en8.de>
Subject: Re: 50 Watt idle power regression bisected to Linux-3.10

On Wed, 11 Dec 2013, Mike Galbraith wrote:
> Alakazam..
> Yup, magical gremlin repellent works on 8 socket DL980 too.

Now here is a less magical version of the gremlin repellent.

And just for the amusement value: The erratum for the series 7400
says:

AAI65. MONITOR/MWAIT May Have Excessive False Wakeups

Problem:	Normally, if MWAIT is used to enter a C-state that is
		C1 or higher, a store to the address range armed by
		the MONITOR instruction will cause the processor to
		exit MWAIT. Due to this erratum, false wakeups may
		occur when the monitored address range was recently
		written prior to executing the MONITOR instruction.

Implication:	Due to this erratum, performance and power savings may
		be impacted due to excessive false wakeups.

Workaround: 	Execute a CLFLUSH Instruction immediately before every
		MONITOR instruction when the monitored location may
		have been recently written.

Now that looks like the very same issue on these westmere EX
machines.

These false wakeups can be observed already before the idle changes
and now they are just more prominent.

Adding that clflush() unconditionally fixes the issue at least on
Boris machine.

Mike, can you retest on that 8 socket monstrum, please?

So it looks like the idle power regression is actually a software
change which exhibits a hardware "regression".

So much for proper validated advertising which promises core power 0W
at idle for these beasts :)

Thanks,

	tglx
---
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 92d1206..50299ad 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -376,7 +376,7 @@ static int intel_idle(struct cpuidle_device *dev,
 		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
 
 	if (!current_set_polling_and_test()) {
-
+		clflush(&current_thread_info()->flags);
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
 		smp_mb();
 		if (!need_resched())
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ