lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081215190509.GB5620@alberich.amd.com>
Date:	Mon, 15 Dec 2008 20:05:09 +0100
From:	Andreas Herrmann <andreas.herrmann3@....com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after
	suspend/resume

On Fri, Dec 12, 2008 at 08:06:21PM +0100, Andi Kleen wrote:
> Andreas Herrmann <andreas.herrmann3@....com> writes:
> 
> > Impact: fix suspend/resume bug with MCE
> >
> > After suspend/resume MCx_CTL registers of secondary CPUs are cleared.
> > (At least that's what I've observed on several systems.)
> > Linux currently only re-initializes MCE on the boot CPU - see mce_resume().
> > Thus after suspend/resume we end up with a system where MCE is active
> > on the boot CPU but switched off on all other CPUs.
> >
> > By calling mce_init() whenever a CPU comes online this problem is
> > solved.
> 
> Can you double check that please?
> 
> Suspend/resume are supposted to hotunplug all CPUs except the BP and
> then re-online them on resume (with "disable_nonboot_cpus()) . The
> re-online initializes MCEs in the standard CPU bootup path.

For BP we have

/* On resume clear all MCE state. Don't want to see leftovers from the BIOS.
   Only one CPU is active at this time, the others get readded later using
   CPU hotplug. */
static int mce_resume(struct sys_device *dev)
{
        mce_init(NULL);
        return 0;
}

For APs mcheck_init() is called on resume. But as the respective bit
for an AP is usually set in "mce_cpus" after boot (which is correct, I
think) mcheck_init does not call mce_init, see:

void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
{
        static cpumask_t mce_cpus = CPU_MASK_NONE;

        mce_cpu_quirks(c);

        if (mce_dont_init ||
            cpu_test_and_set(smp_processor_id(), mce_cpus) ||
            !mce_available(c))
  =>             return;

        mce_init(NULL);
        mce_cpu_features(c);
}

But we need to call mce_init to clear all MCE state.
IMHO the best location to call mce_init for APs is the cpu notifier.


Regards,

Andreas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ