lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 6 Jun 2012 12:23:34 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Chen Gong <gong.chen@...ux.intel.com>
cc:	LKML <linux-kernel@...r.kernel.org>, tony.luck@...el.com,
	bp@...64.org, x86@...nel.org, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [patch 2/2] x86: mce: Implement cmci poll mode for intel
 machines

On Wed, 6 Jun 2012, Thomas Gleixner wrote:

> On Wed, 6 Jun 2012, Chen Gong wrote:
> > 于 2012/6/5 21:35, Thomas Gleixner 写道:
> > I add some print in timer callback, it shows:
> > 
> > smp_processor_id() = 0, mce_timer_fn data(CPU id) = 10
> > timer->function = ffffffff8102c200, timer pending = 1, CPU = 0
> > (add_timer_on, BUG!!!)
> 
> Sure. That's not a surprise. The timer function for cpu 10 is called
> on cpu 0. And the timer function does:
> 
>    struct timer_list *t = &__get_cpu_var(mce_timer);
> 
> which gets a pointer to the timer of cpu0. And that timer is
> pending. So yes, it's exploding for a good reason.
> 
> Though, this does not tell us how the timer of cpu10 gets on cpu0.
> 
> Did you do any cpu hotplug operations ?

There's a problem in the hotplug code.

        case CPU_DOWN_PREPARE:
	case CPU_DOWN_PREPARE_FROZEN:
                del_timer_sync(t);
                smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
                break;

We delete the timer before we disable mce and cmci. So if the cmci
interrupt kicks the timer after del_timer_sync() and before
mce_disable_cpu() is called on the other core, then the timer is still
enqueued when the cpu goes down. After it's dead the timer is migrated
and then the above scenario happens.

Can you try the following just for a quick test ?

        case CPU_DOWN_PREPARE:
	case CPU_DOWN_PREPARE_FROZEN:
                del_timer_sync(t);
                smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
+		del_timer_sync(t);
                break;

That's not a proper solution, for a proper solution the hotplug code
of mce needs an overhaul. It's patently ugly.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ