lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150618102520.GC1670@pd.tnic>
Date:	Thu, 18 Jun 2015 12:25:20 +0200
From:	Borislav Petkov <bp@...e.de>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	"Wang, Rui Y" <rui.y.wang@...el.com>,
	"Chen, Gong" <gong.chen@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: MCE Bug?

On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
> 
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered  cpus.  I added some
> traces to the real case of a left-over fatal error and got this splat:

Hmm, and calling mce_log from __mcheck_cpu_init_generic() as you
suggested yesterday seems to work on this box here:

[    1.588713] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (fam: 06, model: 2d, stepping
: 07)
[    1.592727] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, full-w Broken BIOS d
etected, complain to your hardware vendor.
[    1.997344] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[    2.000146] Intel PMU driver.
[    2.001376] ... version:                3
[    2.002919] ... bit width:              48
[    2.004626] ... generic registers:      4
[    2.006137] ... value mask:             0000ffffffffffff
[    2.008064] ... max period:             0000ffffffffffff
[    2.010010] ... fixed-purpose events:   3
[    2.011528] ... event mask:             000000070000000f
[    2.017257] x86: Booting SMP configuration:
[    2.019232] .... node  #0, CPUs:          #1
[    2.033848] microcode: CPU1 microcode updated early to revision 0x710, date = 2013-06-17
[    2.038730] mce: [Hardware Error]: Machine check events logged
[    2.050735]    #2
[    2.050735] microcode: CPU2 microcode updated early to revision 0x710, date = 2013-06-17
[    2.056163] mce: [Hardware Error]: Machine check events logged
[    2.068133]    #3
[    2.068140] microcode: CPU3 microcode updated early to revision 0x710, date = 2013-06-17
[    2.07412.324641] microcode: CPU4 microcode updated early to revision 0x710, date = 2013-06-17
[    2.479404]    #5

Stuff gets logged just fine, no splats later.

Hmmm, more staring...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ