[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ycma/pwpb0prSM2N@zn.tnic>
Date: Mon, 27 Dec 2021 11:52:46 +0100
From: Borislav Petkov <bp@...en8.de>
To: Peter Zijlstra <peterz@...radead.org>,
"Luck, Tony" <tony.luck@...el.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
patches@...ts.linux.dev, Youquan Song <youquan.song@...el.com>
Subject: Re: [PATCH v2] x86/mce: Reduce number of machine checks taken during
recovery
On Fri, Dec 24, 2021 at 12:50:01PM +0100, Peter Zijlstra wrote:
> That's inaccurate, fixup_exception() (event if it's spelled correctly)
> does not unconditionally set the trap number in RAX, that's only done by
> ex_handler_fault() (or ex_handler_sgx()), which means all flows into
> this function must pass through: EX_TYPE_FAULT, EX_TYPE_FAULT_MCE or
> EX_TYPE_COPY.
Yeah, they all flow through the copy thing, see _ASM_EXTABLE_CPY()
everywhere in that file.
> Boris might fix up your comment if he applies I suppose..
I did with the following changes so that tip branches merging can work
relatively easily and so that I don't pull in the whole x86/core pile
into ras/core.
The automerge conflict resolve with tip/master is after the patch below
and it looks ok to me too.
Thoughts?
---
>From 0db3d12d3f1e8ee87a2f1cc6049a3a7c0e1f5e6c Mon Sep 17 00:00:00 2001
From: Youquan Song <youquan.song@...el.com>
Date: Thu, 23 Dec 2021 12:07:01 -0800
Subject: [PATCH] x86/mce: Reduce number of machine checks taken during
recovery
When any of the copy functions in arch/x86/lib/copy_user_64.S take a
fault, the fixup code copies the remaining byte count from %ecx to %edx
and unconditionally jumps to .Lcopy_user_handle_tail to continue the
copy in case any more bytes can be copied.
If the fault was #PF this may copy more bytes (because the page fault
handler might have fixed the fault). But when the fault is a machine
check the original copy code will have copied all the way to the poisoned
cache line. So .Lcopy_user_handle_tail will just take another machine
check for no good reason.
Every code path to .Lcopy_user_handle_tail comes from an exception fixup
path, so add a check there to check the trap type (in %eax) and simply
return the count of remaining bytes if the trap was a machine check.
Doing this reduces the number of machine checks taken during synthetic
tests from four to three.
As well as reducing the number of machine checks, this also allows
Skylake generation Xeons to recover some cases that currently fail. The
is because REP; MOVSB is only recoverable when source and destination
are well aligned and the byte count is large. That useless call to
.Lcopy_user_handle_tail may violate one or more of these conditions and
generate a fatal machine check.
[ Tony: Add more details to commit message. ]
[ bp: Fixup comment.
Also, another tip patchset which is adding straight-line speculation
mitigation changes the "ret" instruction to an all-caps macro "RET".
But, since gas is case-insensitive, use "RET" in the newly added asm block
already in order to simplify tip branch merging on its way upstream.
]
Signed-off-by: Youquan Song <youquan.song@...el.com>
Signed-off-by: Tony Luck <tony.luck@...el.com>
Signed-off-by: Borislav Petkov <bp@...e.de>
Link: https://lore.kernel.org/r/YcTW5dh8yTGucDd+@agluck-desk2.amr.corp.intel.com
---
arch/x86/lib/copy_user_64.S | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 2797e630b9b1..e73b76e5bc09 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -225,6 +225,7 @@ EXPORT_SYMBOL(copy_user_enhanced_fast_string)
* Don't try to copy the tail if machine check happened
*
* Input:
+ * eax trap number written by ex_handler_copy()
* rdi destination
* rsi source
* rdx count
@@ -233,12 +234,20 @@ EXPORT_SYMBOL(copy_user_enhanced_fast_string)
* eax uncopied bytes or 0 if successful.
*/
SYM_CODE_START_LOCAL(.Lcopy_user_handle_tail)
+ cmp $X86_TRAP_MC,%eax
+ je 3f
+
movl %edx,%ecx
1: rep movsb
2: mov %ecx,%eax
ASM_CLAC
ret
+3:
+ movl %edx,%eax
+ ASM_CLAC
+ RET
+
_ASM_EXTABLE_CPY(1b, 2b)
SYM_CODE_END(.Lcopy_user_handle_tail)
--
2.29.2
automerge resolve with tip/master:
commit a236a52e9c4b2869211aa1bb515856e625c30ac4 (HEAD -> refs/heads/test)
Merge: be59580168ba 0db3d12d3f1e
Author: Borislav Petkov <bp@...e.de>
Date: Mon Dec 27 11:46:58 2021 +0100
Merge branch 'tip-ras-core' into test
diff --cc arch/x86/lib/copy_user_64.S
index e6ac38587b40,e73b76e5bc09..d4a0494e618b
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@@ -224,14 -241,14 +228,19 @@@ SYM_CODE_START_LOCAL(.Lcopy_user_handle
1: rep movsb
2: mov %ecx,%eax
ASM_CLAC
- ret
+ RET
+ 3:
+ movl %edx,%eax
+ ASM_CLAC
+ RET
+
_ASM_EXTABLE_CPY(1b, 2b)
+
+.Lcopy_user_handle_align:
+ addl %ecx,%edx /* ecx is zerorest also */
+ jmp .Lcopy_user_handle_tail
+
SYM_CODE_END(.Lcopy_user_handle_tail)
/*
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists