lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150720144619.GA9361@nazgul.tnic>
Date:	Mon, 20 Jul 2015 16:46:19 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Joerg Roedel <joro@...tes.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, Joerg Roedel <jroedel@...e.de>
Subject: Re: [PATCH] x86/smpboot: Check for cpu_active on cpu initialization

On Thu, Jul 16, 2015 at 11:17:17AM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@...e.de>
> 
> Currently the code to bring up secondary CPUs only checks
> for cpu_online before it proceeds with launching the per-cpu
> threads for the freshly booted remote CPU.
> 
> But the code to move these threads to the new CPU checks for
> cpu_active to do so. If this check fails the threads end up
> on the wrong CPU, causing warnings and bugs like:
> 
> 	WARNING: CPU: 0 PID: 1 at ../kernel/workqueue.c:4417 workqueue_cpu_up_callback
> 
> and/or:
> 
> 	kernel BUG at ../kernel/smpboot.c:135!
> 
> The reason is that the cpu_active bit for the new CPU
> becomes visible significantly later than the cpu_online bit.

I see

void set_cpu_online(unsigned int cpu, bool online)
{
	if (online) {
		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
	} else {

which is called in start_secondary().

Do you mean that setting the bit in cpu_active_mask gets delayed soo
much? Because it comes right after setting the bit in cpu_online_mask.

> The reasons could be that the kernel runs in a KVM guest,
> where the vCPU thread gets preempted when the cpu_online bit
> is set, but with cpu_active still clear.
> 
> But this could also happen on bare-metal systems with lots
> of CPUs. We have observed this issue on an 88 core x86
> system on bare-metal.
> 
> To fix this issue, wait before the remote CPU is online
> *and* active before launching the per-cpu threads.
> 
> Signed-off-by: Joerg Roedel <jroedel@...e.de>
> ---
>  arch/x86/kernel/smpboot.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d3010aa..30b7b8b 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1006,7 +1006,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
>  	check_tsc_sync_source(cpu);
>  	local_irq_restore(flags);
>  
> -	while (!cpu_online(cpu)) {
> +	while (!cpu_online(cpu) || !cpu_active(cpu)) {
>  		cpu_relax();
>  		touch_nmi_watchdog();

Maybe we should just swap the calls in set_cpu_online() instead? I.e.,


	if (online) {
		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
	}

?

I see cpu_online() being called much more than cpu_active()...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ