lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120524104830.GB27063@aftab.osrc.amd.com>
Date:	Thu, 24 May 2012 12:48:30 +0200
From:	Borislav Petkov <bp@...64.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Chen Gong <gong.chen@...ux.intel.com>,
	"Luck, Tony" <tony.luck@...el.com>,
	"x86@...nel.org" <x86@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] x86: auto poll/interrupt mode switch for CMC to stop CMC
 storm

On Thu, May 24, 2012 at 12:01:13PM +0200, Thomas Gleixner wrote:
> Aside of that machine_check_poll is called from other places as
> well. So looking at mce_timer_start() which is surprisingly the timer
> callback:
> 
> The poll timer rate is self adjusting to intervals down to HZ/100. So
> when you get into a state where the timer rate becomes lower than HZ/5
> we'll trigger that CMCI storm in software and queue work even on
> machines which do not support CMCI or have it disabled. Brilliant,
> isn't it?

Yes, I'm thrilled just by staring at this :-).

> So that rate check belongs into intel_treshold_interrupt() and wants a
> intel specific callback in mce_start_timer() to undo it.

So AFAICT mce_start_timer() sets the polling rate of machine_check_poll,
i.e. we normally poll the MCA registers for errors every 5 minutes. This
is for correctable errors which don't raise #MC exception but only get
logged.

That's why, for example, when you boot your box you see

"Machine check events logged."

in dmesg at timestamp 299.xxx when the hw has either had an MCE causing
it to reboot or has experienced a correctable error during boot.

Oh, I see it now, this thing reconfigures the mce_timer which we use for
the above.

Ok, I'm no timer guy but can we use the same timer for two different
things? This looks pretty fishy. I assumed the CMCI thing adds another,
CMCI-only timer for its purposes.

Thomas, what is the proper design here?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ