lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1407678135.9689.4.camel@debian>
Date:	Sun, 10 Aug 2014 21:42:15 +0800
From:	Chen Yucong <slaoub@...il.com>
To:	Tony Luck <tony.luck@...il.com>
Cc:	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: kill the current thread if MCG_STATUS_RIPV is not set

Hi Tony Luck,

According to the x86 ASDM vol.3A 15.9.3.2, we can find that
Recoverable-not-continuable SRAR Error (RIPV=0, EIPV=x) includes the
following two cases:
  -IA32_MCG_STATUS.RIPV= 0, IA32_MCG_STATUS.EIPV=0, or
  -IA32_MCG_STATUS.RIPV= 0, IA32_MCG_STATUS.EIPV=1.

For the first case, the MCE handler will directly panic the kernel
according the item of severities[]:

/* Neither return not error IP -- no chance to recover -> PANIC */
MCESEV(
       PANIC, "Neither restart nor error IP",
       MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
       ),

For the second case, the MCE handler should directly kill the current
thread according to the ASDM vol.3A 15.9.3.2:

The current executing thread cannot be continued. System software must
terminate the interrupted stream of execution and provide a new stream
of execution on return from the machine check handler for the affected
logical processor.

But the fact is that the MCE handler does not kill the current thread,
but rather to further handling(invoke memory_failure() by TIF_MCE_NOTIFY
).

I think I have been confused by the gap between documentation and source
code. Perhaps there may need a small fix.

thx!
cyc


Signed-off-by: Chen Yucong <slaoub@...il.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
b/arch/x86/kernel/cpu/mcheck/mce.c
index bd9ccda..3394494 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1055,9 +1055,12 @@ void do_machine_check(struct pt_regs *regs, long
error_code)
 
 	/*
 	 * When no restart IP might need to kill or panic.
-	 * Assume the worst for now, but if we find the
-	 * severity is MCE_AR_SEVERITY we have other options.
+	 * This indicates that the error is detected at the instruction
+	 * pointer saved on the stack for this machine check exception
+	 * and restarting execution with the interrupted context is not
+	 * possible.(ASDM vol.3A 15.9.3.2)
 	 */
+
 	if (!(m.mcgstatus & MCG_STATUS_RIPV))
 		kill_it = 1;
 
@@ -1154,12 +1157,13 @@ void do_machine_check(struct pt_regs *regs, long
error_code)
 	if (cfg->tolerant < 3) {
 		if (no_way_out)
 			mce_panic("Fatal machine check on current CPU", &m, msg);
-		if (worst == MCE_AR_SEVERITY) {
+
+		if (kill_it) {
+			force_sig(SIGBUS, current);
+		} else if (worst == MCE_AR_SEVERITY) {
 			/* schedule action before return to userland */
 			mce_save_info(m.addr, m.mcgstatus & MCG_STATUS_RIPV);
 			set_thread_flag(TIF_MCE_NOTIFY);
-		} else if (kill_it) {
-			force_sig(SIGBUS, current);
 		}
 	}
 
-- 
1.7.10.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ