lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20130131120348.372374706@linutronix.de>
Date:	Thu, 31 Jan 2013 15:44:10 -0000
From:	Thomas Gleixner <tglx@...utronix.de>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
	Arjan van de Veen <arjan@...radead.org>,
	Paul Turner <pjt@...gle.com>,
	Richard Weinberger <rw@...utronix.de>,
	Magnus Damm <magnus.damm@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: [patch 00/40] CPU hotplug rework - episode I

The current CPU hotplug implementation has become an increasing
nightmare full of races and undocumented behaviour. The main issue of
the current hotplug scheme is the completely asymetric
startup/teardown process. The hotplug notifiers are mostly
undocumented and the CPU_* actions in lots of implementations seem to
be randomly chosen.

We had a long discussion in San Diego last year about reworking the
hotplug core into a fully symetric state machine. After a few doomed
attempts to convert the existing code into a state machine, I finally
found a workable solution.

The following patch series implements a trivial array based state
machine, which replaces the existing steps in cpu_up/down and also the
notifiers which must run on the hotplugged cpu are converted to a
callback array. This documents clearly the ordering of the callbacks
and also makes the asymetric behaviour very obvious.

This series converts the stop_machine thread to the smpboot
infrastructure, implements the core state machine and converts all
notifiers which have ordering constraints plus a randomly chosen bunch
of other notifiers to the state machine.

The runtime installed callbacks are immediately executed by the core
code on or on behalf of all cpus which have already reached the
corresponding state. A non executing installer function is there as
well to allow simple migration of the existing notifier maze.

The diffstat of the complete series is appended below.

 36 files changed, 1300 insertions(+), 1179 deletions(-)

We add slightly more code at this stage (225 lines alone in a header
file), but most of the conversions are removing code and we have only
tackled about 30 of 130+ instances. Even with the current conversion
state, the resulting text size shrinks already.

Known issues:
The current series has a not yet solved section mismatch issue versus
the array callbacks which are already installed at compile time.

There is more work in the pipeline:

 - Convert all notifiers to the state machine callbacks

 - Analyze the asymetric callbacks and fix them if possible or at
   least document why they need to be asymetric.

 - Unify the low level bringup across the architectures
   (e.g. synchronization between boot and hotplugged cpus, common
   setups, scheduler exposure, etc.)

At the end hotplug should run through an array of callbacks on both
sides with explicit core synchronization points. The ordering should
look like this:

CPUHP_OFFLINE                   // Start state.
CPUHP_PREP_<hardware>           // Kick CPU into life / let it die
CPUHP_PREP_<datastructures>     // Get datastructures set up / freed.
CPUHP_PREP_<threads>            // Create threads for cpu
CPUHP_SYNC			// Synchronization point
CPUHP_INIT_<hardware>		// Startup/teardown on the CPU (interrupts, timers ...)
CPUHP_SCHED_<stuff on CPU>      // Unpark/park per cpu local threads on the CPU.
CPUHP_ENABLE_<stuff_on_CPU>	// Enable/disable facilities 
CPUHP_SYNC			// Synchronization point
CPUHP_SCHED                     // Expose/remove CPU from general scheduler.
CPUHP_ONLINE                    // Final state

All PREP states can fail and the corresponding teardown callbacks are
invoked in the same way as they are invoked on offlining.

The existing DOWN_PREPARE notifier has only two instances which
actually might prevent the CPU from going down: rcu_tree and
padata. We might need to keep them, but these can be explicitly
documented asymetric states.

Quite some of the ONLINE/DOWN_PREPARE notifiers are racy and need a
proper inspection. All other valid users of ONLINE/DOWN_PREPARE
notifiers should be put into the CPUHP_ENABLE state block and be
executed on the hotplugged CPU. I have not seen a single instance
(except scheduler) which needs to be executed before we remove the CPU
from the general scheduler itself.

This final design needs quite some massaging of the current scheduler
code, but last time I discussed this with scheduler folks it seemed to
be doable with a reasonable effort. Other than that I don't see any
(un)real showstoppers on the horizon.

Thanks,

	tglx
---
 arch/arm/kernel/perf_event_cpu.c              |   28 -
 arch/arm/vfp/vfpmodule.c                      |   29 -
 arch/blackfin/kernel/perf_event.c             |   25 -
 arch/powerpc/perf/core-book3s.c               |   29 -
 arch/s390/kernel/perf_cpum_cf.c               |   37 -
 arch/s390/kernel/vtime.c                      |   18 
 arch/sh/kernel/perf_event.c                   |   22 
 arch/x86/kernel/apic/x2apic_cluster.c         |   80 +--
 arch/x86/kernel/cpu/perf_event.c              |   78 +--
 arch/x86/kernel/cpu/perf_event_amd.c          |    6 
 arch/x86/kernel/cpu/perf_event_amd_ibs.c      |   54 --
 arch/x86/kernel/cpu/perf_event_intel.c        |    6 
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  109 +---
 arch/x86/kernel/tboot.c                       |   23 
 drivers/clocksource/arm_generic.c             |   40 -
 drivers/cpufreq/cpufreq_stats.c               |   55 --
 include/linux/cpu.h                           |   45 -
 include/linux/cpuhotplug.h                    |  207 ++++++++
 include/linux/perf_event.h                    |   21 
 include/linux/smpboot.h                       |    5 
 init/main.c                                   |   15 
 kernel/cpu.c                                  |  613 ++++++++++++++++++++++----
 kernel/events/core.c                          |   36 -
 kernel/hrtimer.c                              |   47 -
 kernel/profile.c                              |   92 +--
 kernel/rcutree.c                              |   95 +---
 kernel/sched/core.c                           |  251 ++++------
 kernel/sched/fair.c                           |   16 
 kernel/smp.c                                  |   50 --
 kernel/smpboot.c                              |   11 
 kernel/smpboot.h                              |    4 
 kernel/stop_machine.c                         |  154 ++----
 kernel/time/clockevents.c                     |   13 
 kernel/timer.c                                |   43 -
 kernel/workqueue.c                            |   80 +--
 virt/kvm/kvm_main.c                           |   42 -
 36 files changed, 1300 insertions(+), 1179 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ