linux-kernel - Re: [PATCH 1/2] x86/mce: Only restart instruction after machine check recovery if it is safe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4FACBD9B.8070407@linux.intel.com>
Date:	Fri, 11 May 2012 15:19:55 +0800
From:	Chen Gong <gong.chen@...ux.intel.com>
To:	Tony Luck <tony.luck@...el.com>
CC:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Borislav Petkov <bp@...64.org>,
	"Huang, Ying" <ying.huang@...el.com>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Subject: Re: [PATCH 1/2] x86/mce: Only restart instruction after machine check
 recovery if it is safe

于 2012/5/11 2:01, Tony Luck 写道:
> Section 15.3.1.2 of the software developer manual has this to say
> about the RIPV bit in the IA32_MCG_STATUS register:
> 
> RIPV (restart IP valid) flag, bit 0 — Indicates (when set) that
> program execution can be restarted reliably at the instruction
> pointed to by the instruction pointer pushed on the stack when the
> machine-check exception is generated.  When clear, the program
> cannot be reliably restarted at the pushed instruction pointer.
> 
> We need to save the state of this bit in do_machine_check() and use
> it in mce_notify_process() to force a signal; even if
> memory_failure() says it made a complete recovery ... e.g. replaced
> a clean LRU page).
> 
> Signed-off-by: Tony Luck <tony.luck@...el.com> --- 
> arch/x86/kernel/cpu/mcheck/mce.c |    9 ++++++--- 1 files changed,
> 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c index 66e1c51..3b8ebdc 100644 
> --- a/arch/x86/kernel/cpu/mcheck/mce.c +++
> b/arch/x86/kernel/cpu/mcheck/mce.c @@ -947,9 +947,10 @@ struct
> mce_info { atomic_t		inuse; struct task_struct	*t; __u64			paddr; +
> int			restartable; } mce_info[MCE_INFO_MAX];
> 
> -static void mce_save_info(__u64 addr) +static void
> mce_save_info(__u64 addr, int c) { struct mce_info *mi;
> 
> @@ -957,6 +958,7 @@ static void mce_save_info(__u64 addr) if
> (atomic_cmpxchg(&mi->inuse, 0, 1) == 0) { mi->t = current; 
> mi->paddr = addr; +			mi->restartable = c; return; } } @@ -1136,7
> +1138,7 @@ void do_machine_check(struct pt_regs *regs, long
> error_code) mce_panic("Fatal machine check on current CPU", &m,
> msg); if (worst == MCE_AR_SEVERITY) { /* schedule action before
> return to userland */ -			mce_save_info(m.addr); +
> mce_save_info(m.addr, m.mcgstatus & MCG_STATUS_RIPV); 
> set_thread_flag(TIF_MCE_NOTIFY); } else if (kill_it) { 
> force_sig(SIGBUS, current); @@ -1185,7 +1187,8 @@ void
> mce_notify_process(void)
> 
> pr_err("Uncorrected hardware memory error in user-access at %llx", 
> mi->paddr); -	if (memory_failure(pfn, MCE_VECTOR,
> MF_ACTION_REQUIRED) < 0) { +	if (memory_failure(pfn, MCE_VECTOR,
> MF_ACTION_REQUIRED) < 0 || +			   mi->restartable == 0) { 
> pr_err("Memory error not recovered"); force_sig(SIGBUS, current); 
> }

How about using following condition to decrease the execution time?
if (mi->restartable == 0 ||
    memory_failure(pfn, MCE_VECTOR, MF_ACTION_REQUIRED) < 0)

Since restart operation is impossible, whether recovery operation can
be avoided?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/