lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1336559102-28103-6-git-send-email-imammedo@redhat.com>
Date:	Wed,  9 May 2012 12:25:02 +0200
From:	Igor Mammedov <imammedo@...hat.com>
To:	linux-kernel@...r.kernel.org
Cc:	rob@...dley.net, tglx@...utronix.de, mingo@...hat.com,
	hpa@...or.com, x86@...nel.org, luto@....edu,
	suresh.b.siddha@...el.com, avi@...hat.com, imammedo@...hat.com,
	a.p.zijlstra@...llo.nl, johnstul@...ibm.com, arjan@...ux.intel.com,
	linux-doc@...r.kernel.org
Subject: [PATCH 5/5] Do not mark cpu as not present if we failed to boot it

It will allow to boot cpu later if possible.

v2:
Introduce failed_cpu_boots_limit cmd-line parameter.

At startup udev might try to online cpu even if it have failed to boot
first time. And udev will loop there on cpu that refuses to boot.
So disable cpu after failed_cpu_boots_limit is reached to prevent
udev spinning on onlining persistently faulty cpu.
Guest kernel on overcomitted hosts could use this parameter to set
limit to acceptable number of cpu online failures.

Signed-off-by: Igor Mammedov <imammedo@...hat.com>
---
 Documentation/kernel-parameters.txt |    6 +++++
 arch/x86/kernel/smpboot.c           |   36 +++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c1601e5..6b9bbbc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -825,6 +825,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <interval>,<probability>,<space>,<times>
 			See also Documentation/fault-injection/.
 
+	failed_cpu_boots_limit=[SMP,X86]
+			Number of tries	kernel allowed to boot not responding /
+			stuck cpu. When fail attempts are reached, kernel will
+			disable failed cpu and mark it as not present.
+			Default: 0
+
 	floppy=		[HW]
 			See Documentation/blockdev/floppy.txt.
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index af63cab..2d72a8a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -136,6 +136,28 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
 
 atomic_t init_deasserted;
 
+static int failed_cpu_boots_limit = 0;
+static int cpu_boot_error_nr[NR_CPUS];
+
+static int parse_failed_cpu_boots(char *str)
+{
+	unsigned long val;
+	int err;
+
+	if (!str)
+		return -EINVAL;
+
+	err = kstrtoul(str, 0, &failed_cpu_boots_limit);
+	if (err)
+		return -EINVAL;
+
+	printk(KERN_NOTICE "Limit CPU failed boot attempts: %d\n",
+			failed_cpu_boots_limit);
+
+	return 0;
+}
+__setup("failed_cpu_boots_limit=", parse_failed_cpu_boots);
+
 /*
  * Report back to the Boot Processor.
  * Running on AP.
@@ -810,8 +832,18 @@ do_rest:
 		/* was set by cpu_init() */
 		cpumask_clear_cpu(cpu, cpu_initialized_mask);
 
-		set_cpu_present(cpu, false);
-		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
+		/* was set by smp_callin() */
+		cpumask_clear_cpu(cpu, cpu_callin_mask);
+
+		/* disable CPU if it's failed to boot N times in a row */
+		if (cpu_boot_error_nr[cpu]++ > failed_cpu_boots_limit) {
+			set_cpu_present(cpu, false);
+			per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
+			pr_err("CPU%d: repeatedly fails to boot, disabling.\n",
+				cpu);
+		}
+	} else {
+		cpu_boot_error_nr[cpu] = 0;
 	}
 
 	/* mark "stuck" area as not stuck */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ