linux-kernel - Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150107170654.GG3984@pd.tnic>
Date:	Wed, 7 Jan 2015 18:06:54 +0100
From:	Borislav Petkov <bp@...en8.de>
To:	Aravind Gopalakrishnan <aravind.gopalakrishnan@....com>
Cc:	tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
	tony.luck@...el.com, dougthompson@...ssion.com,
	mchehab@....samsung.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
	dave.hansen@...ux.intel.com, mgorman@...e.de, bp@...e.de,
	riel@...hat.com, jacob.w.shin@...il.com
Subject: Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors

On Tue, Jan 06, 2015 at 05:54:15PM -0600, Aravind Gopalakrishnan wrote:
> Hi Boris,
> It seems my earlier understanding of hardware behavior was not completely
> right.
> Here are some clarifications I have received after some internal discussion-
> When D18F3x44[NBMstToMstCpuEn] is set, the interrupt is also routed to the
> NBC.

Good :)

> This was not immediately clear to me from the description for the field in
> the BKDG.
> The BKDG states that errors are reported to the NBC and also that status,
> addr, ctl
> MSRs for MC4 are only accessible from the NBC.
> I took this to understand that the error info is written to the NBC MSRs
> while
> the #MC could be generated from the non-NBC.
> 
> Now, given that setting NBMstToMstCpuEn ensures #MC is generated only on NBC
> for MC4 errors,
> we don't have a problem to solve in the #MC handler code.
> So, we can discard patch2 of the series,
> 
> But we still need to change the error injection interfaces in mce_amd_inj:
> mce_amd_inj triggers a #MC on the cpu number that the user specifies on
> debugfs.
> For any error other than MC4 errors, this is fine.
> But we should really be triggering #MC only on NBC for MC4 errors.

Why?

As you said yourself, the errors get reported on the NBC. Where they get
*triggered* is a different story.

We do injection as it is described in "2.15.2 Error Injection and
Simulation" in F15h BKDG, for example. Reporting of the thusly injected
bank4 error goes to the NBC.

I don't see the need to fix anything in the code as it is.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/