lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <1199723621.26343.15.camel@localhost.localdomain>
Date:	Mon, 07 Jan 2008 11:33:41 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Adam Belay <abelay@...ell.com>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Hang on bootup when installing cpu idle menu

Hi,

I've notice this hang on bootup every so often. Sometimes it hangs for a
second and sometimes it can hang for hours. Always at the same location.
This is a soft hang. That is, interrupts are still enabled, and I'm able
to perform a sysrq-t.

Here's the problem child:

Call Trace:
 [<c062cd67>] schedule_timeout+0x75/0x93
 [<c062cd9f>] schedule_timeout_uninterruptible+0x1a/0x1c
 [<c043ca12>] msleep+0x15/0x1b
 [<c0407726>] cpu_idle_wait+0x80/0xe0
 [<c05bbfa7>] cpuidle_uninstall_idle_handler+0x28/0x2a
 [<c05bc148>] cpuidle_switch_governor+0x20/0xd9
 [<c05bc2eb>] cpuidle_register_governor+0x6d/0xa6
 [<c076fb04>] init_menu+0x12/0x14
 [<c074e535>] kernel_init+0x154/0x2d1
 [<c0409dc7>] kernel_thread_helper+0x7/0x10


This happens after it installed cpu_idle_ladder, and later while
installing cpu_idle_menu it hangs. But it's not the menu that is the
problem. But it looks like the cpu_idle_wait is.

In cpu_idle from process_32.c we have:

		while (!need_resched()) {
			void (*idle)(void);

			if (__get_cpu_var(cpu_idle_state))
				__get_cpu_var(cpu_idle_state) = 0;

			check_pgt_cache();
			rmb();
			idle = pm_idle;

			if (!idle)
				idle = default_idle;

			if (cpu_is_offline(cpu))
				play_dead();

			__get_cpu_var(irq_stat).idle_timestamp = jiffies;
			idle();
		}

and in cpu_idle_wait:

	set_cpus_allowed(current, cpumask_of_cpu(this_cpu));
	put_cpu();

	cpus_clear(map);
	for_each_online_cpu(cpu) {
		per_cpu(cpu_idle_state, cpu) = 1;
		cpu_set(cpu, map);
	}

	__get_cpu_var(cpu_idle_state) = 0;

	wmb();
	do {
		ssleep(1);
		for_each_online_cpu(cpu) {
			if (cpu_isset(cpu, map) && !per_cpu(cpu_idle_state, cpu))
				cpu_clear(cpu, map);
		}
		cpus_and(map, map, cpu_online_map);
	} while (!cpus_empty(map));


Basically, we set cpu_idle_state to one for each online cpu. Clear the
cpu that is currently running. Then keep sleeping and wait till each cpu
passes the cpu_idle and clears the cpu_idle_state per_cpu variable.

The problem I see here, especially in boot up, is that it doesn't check
to see if a cpu is already idle.  So if a CPU is idle and NO_HZ is set
(yes I have that set), we can shut down an online CPU if there's no
interrupts to run nor any threads to run. Which is the case on bootup.

I first thought about adding an "in_idle" per_cpu variable to check, but
then I'm thinking that this can cause other races. Because this is used
in the idle governor code, and that code can also be a module, we might
remove the idle code when CPUS are still using it.  So perhaps a simple
smp_call_function to an empty function might be good enough. Something
like this?

---
 arch/x86/kernel/process_32.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-compile-i386.git/arch/x86/kernel/process_32.c
===================================================================
--- linux-compile-i386.git.orig/arch/x86/kernel/process_32.c	2008-01-03 17:25:46.000000000 -0500
+++ linux-compile-i386.git/arch/x86/kernel/process_32.c	2008-01-07 11:15:39.000000000 -0500
@@ -204,6 +204,10 @@ void cpu_idle(void)
 	}
 }
 
+static void do_nothing(void *unused)
+{
+}
+
 void cpu_idle_wait(void)
 {
 	unsigned int cpu, this_cpu = get_cpu();
@@ -221,6 +225,10 @@ void cpu_idle_wait(void)
 	__get_cpu_var(cpu_idle_state) = 0;
 
 	wmb();
+
+	/* make sure all CPUs are awake */
+	smp_call_function(do_nothing, NULL, 0, 0);
+
 	do {
 		ssleep(1);
 		for_each_online_cpu(cpu) {



Thoughts?

Disclaimer: I've been seeing this when working with the mcount patches.
So I haven't seen it on a vanilla git repo. But... I haven't been
booting vanilla git repos, and this hang doesn't look related to mcount.
Another problem is that this hang only happens once every 100 boots or
so.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ