lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150324083039.GA11525@pd.tnic>
Date:	Tue, 24 Mar 2015 09:30:39 +0100
From:	Borislav Petkov <bp@...en8.de>
To:	Aravind Gopalakrishnan <Aravind.Gopalakrishnan@....com>
Cc:	tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
	tony.luck@...el.com, slaoub@...il.com, luto@...capital.net,
	x86@...nel.org, linux-kernel@...r.kernel.org,
	linux-edac@...r.kernel.org
Subject: Re: [PATCH V3 1/2] x86, mce, severities: Add AMD severities function

On Mon, Mar 23, 2015 at 10:42:52AM -0500, Aravind Gopalakrishnan wrote:
> Add a severities function that caters to AMD processors.
> This allows us to do some vendor specific work within the
> function if necessary.
> 
> Also, introduce a vendor flag bitfield which contains vendor
> specific flags. The severities code uses this to define error
> scope based on the prescence of the flags field.
> 
> This is based off of work by Boris Petkov.
> 
> Testing details:
> Tested the patch for any regressions on
> Fam10h, Model 9h (Greyhound)
> Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo),
> Fam16h Model 00h-0fh (Kabini)
> 
> Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@....com>
> ---
> Changes from V2:
>  - Rebase on top of latest tip
>  - Tested patch on more systems and updated commit message appropriately
> 
> Changes from V1:
>  - Test mce_flags.overflow_recov once instead of multiple times
> 
>  arch/x86/include/asm/mce.h                |  6 ++++
>  arch/x86/kernel/cpu/mcheck/mce-severity.c | 53 +++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/mcheck/mce.c          |  9 ++++++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index fd38a23..b574fbf 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -116,6 +116,12 @@ struct mca_config {
>  	u32 rip_msr;
>  };
>  
> +struct mce_vendor_flags {
> +	__u64		overflow_recov	: 1, /* cpuid_ebx(80000007) */
> +			__reserved_0	: 63;
> +};
> +extern struct mce_vendor_flags mce_flags;
> +
>  extern struct mca_config mca_cfg;
>  extern void mce_register_decode_chain(struct notifier_block *nb);
>  extern void mce_unregister_decode_chain(struct notifier_block *nb);
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> index 8bb4330..4f8f87d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -186,12 +186,65 @@ static int error_context(struct mce *m)
>  	return ((m->cs & 3) == 3) ? IN_USER : IN_KERNEL;
>  }
>  
> +/* keeping mce_severity_amd in sync with AMD error scope heirarchy table */

Which table do you mean?

I changed it to:

/*
 * See AMD Error Scope Hierarchy table in a newer BKDG. For example
 * 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"
 */

to explicitly name it.

> +static int mce_severity_amd(struct mce *m, enum context ctx)
> +{
> +	enum context ctx = error_context(m);

arch/x86/kernel/cpu/mcheck/mce-severity.c: In function ‘mce_severity_amd’:
arch/x86/kernel/cpu/mcheck/mce-severity.c:192:15: error: ‘ctx’ redeclared as different kind of symbol
  enum context ctx = error_context(m);
               ^
arch/x86/kernel/cpu/mcheck/mce-severity.c:190:57: note: previous definition of ‘ctx’ was here
 static int mce_severity_amd(struct mce *m, enum context ctx)
                                                         ^
make[4]: *** [arch/x86/kernel/cpu/mcheck/mce-severity.o] Error 1
make[3]: *** [arch/x86/kernel/cpu/mcheck] Error 2
make[2]: *** [arch/x86/kernel/cpu] Error 2
make[1]: *** [arch/x86/kernel] Error 2
make: *** [arch/x86] Error 2
make: *** Waiting for unfinished jobs....

I fixed it up.

I've committed this:

---
From: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@....com>
Date: Mon, 23 Mar 2015 10:42:52 -0500
Subject: [PATCH] x86/mce: Add an AMD severities-grading function

Add a severities function that caters to AMD processors. This allows us
to do some vendor-specific work within the function if necessary.

Also, introduce a vendor flag bitfield for vendor-specific settings. The
severities code uses this to define error scope based on the prescence
of the flags field.

This is based off of work by Boris Petkov.

Testing details:
Fam10h, Model 9h (Greyhound)
Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo),
Fam16h Model 00h-0fh (Kabini)

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@....com>
Acked-by: Tony Luck <tony.luck@...el.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...nel.org>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Andy Lutomirski <luto@...capital.net>
Cc: linux-edac@...r.kernel.org
Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Fixup build, clean up comments. ]
Signed-off-by: Borislav Petkov <bp@...e.de>
---
 arch/x86/include/asm/mce.h                |  6 ++++
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 56 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce.c          |  9 +++++
 3 files changed, 71 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index fd38a23e729f..b574fbf62d39 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -116,6 +116,12 @@ struct mca_config {
 	u32 rip_msr;
 };
 
+struct mce_vendor_flags {
+	__u64		overflow_recov	: 1, /* cpuid_ebx(80000007) */
+			__reserved_0	: 63;
+};
+extern struct mce_vendor_flags mce_flags;
+
 extern struct mca_config mca_cfg;
 extern void mce_register_decode_chain(struct notifier_block *nb);
 extern void mce_unregister_decode_chain(struct notifier_block *nb);
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 8bb433043a7f..e16f3f201e06 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -186,12 +186,68 @@ static int error_context(struct mce *m)
 	return ((m->cs & 3) == 3) ? IN_USER : IN_KERNEL;
 }
 
+/*
+ * See AMD Error Scope Hierarchy table in a newer BKDG. For example
+ * 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"
+ */
+static int mce_severity_amd(struct mce *m, enum context ctx)
+{
+	/* Processor Context Corrupt, no need to fumble too much, die! */
+	if (m->status & MCI_STATUS_PCC)
+		return MCE_PANIC_SEVERITY;
+
+	if (m->status & MCI_STATUS_UC) {
+
+		/*
+		 * On older systems where overflow_recov flag is not present, we
+		 * should simply panic if an error overflow occurs. If
+		 * overflow_recov flag is present and set, then software can try
+		 * to at least kill process to prolong system operation.
+		 */
+		if (mce_flags.overflow_recov) {
+			/* software can try to contain */
+			if (!(m->mcgstatus & MCG_STATUS_RIPV))
+				if (ctx == IN_KERNEL)
+					return MCE_PANIC_SEVERITY;
+
+				/* kill current process */
+				return MCE_AR_SEVERITY;
+		} else {
+			/* at least one error was not logged */
+			if (m->status & MCI_STATUS_OVER)
+				return MCE_PANIC_SEVERITY;
+		}
+
+		/*
+		 * For any other case, return MCE_UC_SEVERITY so that we log the
+		 * error and exit #MC handler.
+		 */
+		return MCE_UC_SEVERITY;
+	}
+
+	/*
+	 * deferred error: poll handler catches these and adds to mce_ring so
+	 * memory-failure can take recovery actions.
+	 */
+	if (m->status & MCI_STATUS_DEFERRED)
+		return MCE_DEFERRED_SEVERITY;
+
+	/*
+	 * corrected error: poll handler catches these and passes responsibility
+	 * of decoding the error to EDAC
+	 */
+	return MCE_KEEP_SEVERITY;
+}
+
 int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)
 {
 	enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP);
 	enum context ctx = error_context(m);
 	struct severity *s;
 
+	if (m->cpuvendor == X86_VENDOR_AMD)
+		return mce_severity_amd(m, ctx);
+
 	for (s = severities;; s++) {
 		if ((m->status & s->mask) != s->result)
 			continue;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8548b714a16b..1189f1150a19 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -64,6 +64,7 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
 struct mce_bank *mce_banks __read_mostly;
+struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
 	.bootlog  = -1,
@@ -1535,6 +1536,13 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 			mce_banks[0].ctl = 0;
 
 		/*
+		 * overflow_recov is supported for F15h Models 00h-0fh
+		 * even though we don't have a CPUID bit for it.
+		 */
+		if (c->x86 == 0x15 && c->x86_model <= 0xf)
+			mce_flags.overflow_recov = 1;
+
+		/*
 		 * Turn off MC4_MISC thresholding banks on those models since
 		 * they're not supported there.
 		 */
@@ -1633,6 +1641,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 		break;
 	case X86_VENDOR_AMD:
 		mce_amd_feature_init(c);
+		mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
 		break;
 	default:
 		break;
-- 
2.3.3


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ