lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com>
Date:	Thu, 19 Aug 2010 17:03:38 -0700
From:	Suresh Siddha <suresh.b.siddha@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	"mingo@...e.hu" <mingo@...e.hu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"chris@...stnet.net" <chris@...stnet.net>,
	"debian00@...ceadsl.fr" <debian00@...ceadsl.fr>,
	"hpa@...or.com" <hpa@...or.com>,
	"jonathan.protzenko@...il.com" <jonathan.protzenko@...il.com>,
	"mans@...sr.com" <mans@...sr.com>,
	"psastudio@...l.ru" <psastudio@...l.ru>,
	"rjw@...k.pl" <rjw@...k.pl>,
	"stephan.eicher@....de" <stephan.eicher@....de>,
	"sxxe@....de" <sxxe@....de>,
	"thomas@...hlinux.org" <thomas@...hlinux.org>,
	"venki@...gle.com" <venki@...gle.com>,
	"wonghow@...il.com" <wonghow@...il.com>,
	"stable@...nel.org" <stable@...nel.org>, tglx <tglx@...utronix.de>
Subject: Re: [patch 1/3] sched: init rt_avg stat whenever rq comes online

On Thu, 2010-08-19 at 01:53 -0700, Peter Zijlstra wrote:
> ARGH, please kill all SMM support for future CPUs ;-)
> 
> Are the TSCs still sync'ed though? 

Yes.

> If so, we can still compute a offset
> and continue with things, albeit it requires something like:
> 
>   local_irq_save(flags);
>   __get_cpu_var(cyc2ns_offset) = 0;
>   offset = cyc2ns_suspend - sched_clock();
>   local_irq_restore(flags);
> 
>   for_each_possible_cpu(i)
>     per_cpu(cyc2ns_offset, i) = offset;
> 
> Which would take the funny offset into account and make it resume at
> where we left off.
> 
> If they got out of sync, we need to flip sched_clock_stable and work on
> getting the sched_clock.c code to be monotonic over such a flip.
> 
> > So such large values of TSC (leading to a very big difference between
> > rq->clock and rq->age_stamp) wont be correctly handled by
> > scale_rt_power() either.
> 
> Still, we need to fix the clock, not fudge the users.

Ok. I have appended a patch doing this. Seems to fix the scheduler
performance issue triggered by suspend/resume. Can you please Ack it?

Thomas/Peter/Ingo: can you please pick this up if you have no other
objections. Thanks.
---

From: Suresh Siddha <suresh.b.siddha@...el.com>
Subject: x86, tsc: recompute cyc2ns_offset's during resume from sleep states

TSC's get reset after suspend/resume (even on cpu's with invariant TSC which
runs at a constant rate across ACPI P-, C- and T-states). And in some systems
BIOS seem to reinit TSC to arbitrary large value (still sync'd across cpu's)
during resume.

This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less than
rq->age_stamp (introduced in 2.6.32). This leads to a big value returned by
scale_rt_power() and the resulting big group power set by the update_group_power()
is causing improper load balancing between busy and idle cpu's after suspend/resume.

This resulted in multi-threaded workloads (like kernel-compilation) go slower
after suspend/resume cycle on core i5 laptops.

Fix this by recomputing cyc2ns_offset's during resume, so that sched_clock()
continues from the point where it was left off during suspend.

Reported-by: Florian Pritz <flo@...n.at>
Signed-off-by: Suresh Siddha <suresh.b.siddha@...el.com>
Cc: stable@...nel.org [2.6.32+]
---

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index c042729..1ca132f 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -59,5 +59,7 @@ extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 
 extern int notsc_setup(char *);
+extern void save_sched_clock_state(void);
+extern void restore_sched_clock_state(void);
 
 #endif /* _ASM_X86_TSC_H */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..d632934 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -626,6 +626,44 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 	local_irq_restore(flags);
 }
 
+static unsigned long long cyc2ns_suspend;
+
+void save_sched_clock_state(void)
+{
+	if (!sched_clock_stable)
+		return;
+
+	cyc2ns_suspend = sched_clock();
+}
+
+/*
+ * Even on processors with invariant TSC, TSC gets reset in some the
+ * ACPI system sleep states. And in some systems BIOS seem to reinit TSC to
+ * arbitrary value (still sync'd across cpu's) during resume from such sleep
+ * states. To cope up with this, recompute the cyc2ns_offset for each cpu so
+ * that sched_clock() continues from the point where it was left off during
+ * suspend.
+ */
+void restore_sched_clock_state(void)
+{
+	unsigned long long offset;
+	unsigned long flags;
+	int cpu;
+
+	if (!sched_clock_stable)
+		return;
+
+	local_irq_save(flags);
+
+	get_cpu_var(cyc2ns_offset) = 0;
+	offset = cyc2ns_suspend - sched_clock();
+
+	for_each_possible_cpu(cpu)
+		per_cpu(cyc2ns_offset, cpu) = offset;
+
+	local_irq_restore(flags);
+}
+
 #ifdef CONFIG_CPU_FREQ
 
 /* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index e7e8c5f..87bb35e 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -113,6 +113,7 @@ static void __save_processor_state(struct saved_context *ctxt)
 void save_processor_state(void)
 {
 	__save_processor_state(&saved_context);
+	save_sched_clock_state();
 }
 #ifdef CONFIG_X86_32
 EXPORT_SYMBOL(save_processor_state);
@@ -229,6 +230,7 @@ static void __restore_processor_state(struct saved_context *ctxt)
 void restore_processor_state(void)
 {
 	__restore_processor_state(&saved_context);
+	restore_sched_clock_state();
 }
 #ifdef CONFIG_X86_32
 EXPORT_SYMBOL(restore_processor_state);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ