linux-kernel - Re: [PATCH] smpboot: Add smpboot state variables instead of reusing CPU hotplug states

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151107071723.GH3879@linux.vnet.ibm.com>
Date:	Fri, 6 Nov 2015 23:17:23 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Daniel Wagner <daniel.wagner@...-carit.de>
Cc:	linux-kernel@...r.kernel.org, xen-devel@...ts.xenproject.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] smpboot: Add smpboot state variables instead of reusing
 CPU hotplug states

On Thu, Oct 15, 2015 at 01:32:44PM +0200, Daniel Wagner wrote:
> The cpu hotplug state machine in smpboot.c is reusing the states from
> cpu.h. That is confusing when it comes to the CPU_DEAD_FROZEN usage.
> Paul explained to me that he was in need of an additional state
> for destinguishing between a CPU error states. For this he just
> picked CPU_DEAD_FROZEN.
> 
> 8038dad7e888581266c76df15d70ca457a3c5910 smpboot: Add common code for notification from dying CPU
> 2a442c9c6453d3d043dfd89f2e03a1deff8a6f06 x86: Use common outgoing-CPU-notification code
> 
> Instead of reusing the states, let's add new definition inside
> the smpboot.c file with explenation what those states
> mean. Thanks Paul for providing them.
> 
> Signed-off-by: Daniel Wagner <daniel.wagner@...-carit.de>

Apologies for the delay, I didn't realize that you were waiting on me.

Reviewed-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: xen-devel@...ts.xenproject.org
> Cc: linux-kernel@...r.kernel.org
> ---
>  arch/x86/xen/smp.c  |  4 +--
>  include/linux/cpu.h |  3 +-
>  kernel/smpboot.c    | 82 ++++++++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 67 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 3f4ebf0..804bf5c 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -495,7 +495,7 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
>  	rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
>  	BUG_ON(rc);
> 
> -	while (cpu_report_state(cpu) != CPU_ONLINE)
> +	while (!cpu_check_online(cpu))
>  		HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
> 
>  	return 0;
> @@ -767,7 +767,7 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
>  	 * This can happen if CPU was offlined earlier and
>  	 * offlining timed out in common_cpu_die().
>  	 */
> -	if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
> +	if (cpu_check_timeout(cpu)) {
>  		xen_smp_intr_free(cpu);
>  		xen_uninit_lock_cpu(cpu);
>  	}
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 23c30bd..f78ab46 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -284,7 +284,8 @@ void arch_cpu_idle_dead(void);
> 
>  DECLARE_PER_CPU(bool, cpu_dead_idle);
> 
> -int cpu_report_state(int cpu);
> +int cpu_check_online(int cpu);
> +int cpu_check_timeout(int cpu);
>  int cpu_check_up_prepare(int cpu);
>  void cpu_set_state_online(int cpu);
>  #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index a818cbc..75e5724 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -371,19 +371,63 @@ int smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
>  }
>  EXPORT_SYMBOL_GPL(smpboot_update_cpumask_percpu_thread);
> 
> +/* The CPU is offline, and its last offline operation was
> + * successful and proceeded normally.  (Or, alternatively, the
> + * CPU never has come online, as this is the initial state.)
> + */
> +#define CPUHP_POST_DEAD		0x01
> +
> +/* The CPU is in the process of coming online.
> + * Simple architectures can skip this state, and just invoke
> + * cpu_set_state_online() unconditionally instead.
> + */
> +#define CPUHP_UP_PREPARE	0x02
> +
> +/* The CPU is now online.  Simple architectures can skip this
> + * state, and just invoke cpu_wait_death() and cpu_report_death()
> + * unconditionally instead.
> + */
> +#define CPUHP_ONLINE		0x03
> +
> +/* The CPU has gone offline, so that it may now be safely
> + * powered off (or whatever the architecture needs to do to it).
> + */
> +#define CPUHP_DEAD		0x04
> +
> +/* The CPU did not go offline in a timely fashion, if at all,
> + * so it might need special processing at the next online (for
> + * example, simply refusing to bring it online).
> + */
> +#define CPUHP_BROKEN		0x05
> +
> +/* The CPU eventually did go offline, but not in a timely
> + * fashion.  If some sort of reset operation is required before it
> + * can be brought online, that reset operation needs to be carried
> + * out at online time.  (Or, again, the architecture might simply
> + * refuse to bring it online.)
> + */
> +#define CPUHP_TIMEOUT		0x06
> +
>  static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
> 
>  /*
>   * Called to poll specified CPU's state, for example, when waiting for
>   * a CPU to come online.
>   */
> -int cpu_report_state(int cpu)
> +int cpu_check_online(int cpu)
> +{
> +	return atomic_read(&per_cpu(cpu_hotplug_state, cpu)) ==
> +			   CPUHP_ONLINE;
> +}
> +
> +int cpu_check_timeout(int cpu)
>  {
> -	return atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> +	return atomic_read(&per_cpu(cpu_hotplug_state, cpu)) ==
> +			   CPUHP_TIMEOUT;
>  }
> 
>  /*
> - * If CPU has died properly, set its state to CPU_UP_PREPARE and
> + * If CPU has died properly, set its state to CPUHP_UP_PREPARE and
>   * return success.  Otherwise, return -EBUSY if the CPU died after
>   * cpu_wait_death() timed out.  And yet otherwise again, return -EAGAIN
>   * if cpu_wait_death() timed out and the CPU still hasn't gotten around
> @@ -397,19 +441,19 @@ int cpu_report_state(int cpu)
>  int cpu_check_up_prepare(int cpu)
>  {
>  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
> -		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPUHP_UP_PREPARE);
>  		return 0;
>  	}
> 
>  	switch (atomic_read(&per_cpu(cpu_hotplug_state, cpu))) {
> 
> -	case CPU_POST_DEAD:
> +	case CPUHP_POST_DEAD:
> 
>  		/* The CPU died properly, so just start it up again. */
> -		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPUHP_UP_PREPARE);
>  		return 0;
> 
> -	case CPU_DEAD_FROZEN:
> +	case CPUHP_TIMEOUT:
> 
>  		/*
>  		 * Timeout during CPU death, so let caller know.
> @@ -424,7 +468,7 @@ int cpu_check_up_prepare(int cpu)
>  		 */
>  		return -EBUSY;
> 
> -	case CPU_BROKEN:
> +	case CPUHP_BROKEN:
> 
>  		/*
>  		 * The most likely reason we got here is that there was
> @@ -452,7 +496,7 @@ int cpu_check_up_prepare(int cpu)
>   */
>  void cpu_set_state_online(int cpu)
>  {
> -	(void)atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPU_ONLINE);
> +	(void)atomic_xchg(&per_cpu(cpu_hotplug_state, cpu), CPUHP_ONLINE);
>  }
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> @@ -470,12 +514,12 @@ bool cpu_wait_death(unsigned int cpu, int seconds)
>  	might_sleep();
> 
>  	/* The outgoing CPU will normally get done quite quickly. */
> -	if (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD)
> +	if (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) == CPUHP_DEAD)
>  		goto update_state;
>  	udelay(5);
> 
>  	/* But if the outgoing CPU dawdles, wait increasingly long times. */
> -	while (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) {
> +	while (atomic_read(&per_cpu(cpu_hotplug_state, cpu)) != CPUHP_DEAD) {
>  		schedule_timeout_uninterruptible(sleep_jf);
>  		jf_left -= sleep_jf;
>  		if (jf_left <= 0)
> @@ -484,14 +528,14 @@ bool cpu_wait_death(unsigned int cpu, int seconds)
>  	}
>  update_state:
>  	oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> -	if (oldstate == CPU_DEAD) {
> +	if (oldstate == CPUHP_DEAD) {
>  		/* Outgoing CPU died normally, update state. */
>  		smp_mb(); /* atomic_read() before update. */
> -		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD);
> +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPUHP_POST_DEAD);
>  	} else {
>  		/* Outgoing CPU still hasn't died, set state accordingly. */
>  		if (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
> -				   oldstate, CPU_BROKEN) != oldstate)
> +				   oldstate, CPUHP_BROKEN) != oldstate)
>  			goto update_state;
>  		ret = false;
>  	}
> @@ -502,7 +546,7 @@ update_state:
>   * Called by the outgoing CPU to report its successful death.  Return
>   * false if this report follows the surviving CPU's timing out.
>   *
> - * A separate "CPU_DEAD_FROZEN" is used when the surviving CPU
> + * A separate "CPUHP_TIMEOUT" is used when the surviving CPU
>   * timed out.  This approach allows architectures to omit calls to
>   * cpu_check_up_prepare() and cpu_set_state_online() without defeating
>   * the next cpu_wait_death()'s polling loop.
> @@ -515,13 +559,13 @@ bool cpu_report_death(void)
> 
>  	do {
>  		oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> -		if (oldstate != CPU_BROKEN)
> -			newstate = CPU_DEAD;
> +		if (oldstate != CPUHP_BROKEN)
> +			newstate = CPUHP_DEAD;
>  		else
> -			newstate = CPU_DEAD_FROZEN;
> +			newstate = CPUHP_TIMEOUT;
>  	} while (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
>  				oldstate, newstate) != oldstate);
> -	return newstate == CPU_DEAD;
> +	return newstate == CPUHP_DEAD;
>  }
> 
>  #endif /* #ifdef CONFIG_HOTPLUG_CPU */
> -- 
> 2.4.3
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/