lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111206084230.GC30062@elte.hu>
Date:	Tue, 6 Dec 2011 09:42:30 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Fenghua Yu <fenghua.yu@...el.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	H Peter Anvin <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tony Luck <tony.luck@...el.com>,
	Arjan van de Ven <arjan.van.de.ven@...el.com>,
	Suresh B Siddha <suresh.b.siddha@...el.com>,
	Len Brown <len.brown@...el.com>,
	Randy Dunlap <rdunlap@...otime.net>,
	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-pm <linux-pm@...r.kernel.org>, x86 <x86@...nel.org>
Subject: Re: [PATCH v4 0/7] x86: BSP or CPU0 online/offline


* Fenghua Yu <fenghua.yu@...el.com> wrote:

> From: Fenghua Yu <fenghua.yu@...el.com>
> 
> BSP or CPU0 has been the last obstacle to CPU hotplug on x86. 
> This patch set implements BSP online and offline and removes 
> this obstacle to CPU hotplug.
> 
> RAS needs the feature. If socket0 needs to be hotplugged for 
> any reason (any thread on socket0 is bad, shared cache issue, 
> uncore issue, etc), CPU0 is required to be offline or hot 
> replaced to keep the system run.
> 
> v4: Add __read_mostly for bsp_hotpluggable variable. Add my 
> email address in cpu-hotplug.txt document. A wording change in 
> comment.
> 
> v3: Register a pm notifier to check if CPU0 is online before 
> hibernate/suspend. Small wording changes in document and print 
> info.
> 
> v2: Add locking changes between cpu hotplug and 
> hibernate/suspend. Change PIC irq bound to CPU0 detection.
> 
> Fenghua Yu (7):
>   x86/topology.c: Support functions for BSP online/offline
>   x86/common.c: Init BSP data during BSP online
>   x86/mtrr/main.c: Ask the first online CPU to save mtrr
>   x86/smpboot.c: Don't offline BSP if any irq can not be migrated out
>     of it
>   Documentations/cpu-hotplug.tx, kernel-parameters.txt: Add x86 CPU0
>     online/offline feature
>   x86/i387.c: Thread xstate is initialized only on BSP once
>   x86/power/cpu.c: Don't hibernate/suspend if CPU0 is offline
> 
>  Documentation/cpu-hotplug.txt       |   19 +++++++++++++++
>  Documentation/kernel-parameters.txt |   13 ++++++++++
>  arch/x86/include/asm/processor.h    |    1 +
>  arch/x86/kernel/cpu/common.c        |   13 ++++++++--
>  arch/x86/kernel/cpu/mtrr/main.c     |    9 +++++-
>  arch/x86/kernel/i387.c              |    9 ++++++-
>  arch/x86/kernel/smpboot.c           |   43 ++++++++++++++++++++++++++++-----
>  arch/x86/kernel/topology.c          |   24 +++++++++++++-----
>  arch/x86/power/cpu.c                |   44 +++++++++++++++++++++++++++++++++++
>  9 files changed, 155 insertions(+), 20 deletions(-)

This is an interesting but totally scary feature to me! :-)

One aspect that makes it essentially undebuggable is that right 
now it needs a boot parameter to even activate. Few people will 
test it this way.

So at minimum i'd suggest adding a new Kconfig switch ak'a 
CONFIG_BOOTPARAM_HOTPLUG_BOOT_CPU (disabled by default) which 
alpha-testers, randconfig setups and distributions with suicidal 
tendencies could enable by default.

Also, could you please enumerate all limitations that could 
possibly happen? The documentation has this list right now:

+1. Resume from hibernate/suspend depends on BSP. Hibernate/suspend will fail if
+BSP is offline and you need to online BSP before hibernate/suspend can continue.

This needs to be fixed on some other fashion than warning people 
in documentation that it would break.

Firstly, at minimum a suspend/hibernate attempt should fail in 
some deterministic fashion.

Secondly, and more importantly, is there *any* hardware in 
existence that has a BIOS that can suspend/resume successfully 
with BSP offlined? If such hardware exists then we need to 
support it properly - initially perhaps by whitelisting such 
systems.

Then if demand for this picks up some more intelligent method of 
cooperating with the firmware could be added: the firmware could 
actually signal to us whether it supports suspend/resume from 
other than the boot CPU.

+2. PIC interrupts also depend on BSP. BSP can't be removed if a PIC interrupt
+is detected.
+
+It's said poweroff/reboot may depend on BSP on some machines although I haven't
+seen any poweroff/reboot failure so far after BSP is offline on a few tested
+machines.

We need a debug feature for this: CONFIG_DEBUG_BOOT_CPU_OFF=y or 
such (disabled by default): this feature would offline the boot 
CPU as soon as possible, and boot up userspace with the boot CPU 
offlined.

So these are the things we need to even begin considering such a 
patch-set for mainline.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ