[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161202192050.l5l3rcwems6hptub@pd.tnic>
Date: Fri, 2 Dec 2016 20:20:50 +0100
From: Borislav Petkov <bp@...en8.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Borislav Petkov <bp@...nel.org>, Andy Lutomirski <luto@...nel.org>,
Peter Anvin <hpa@...or.com>,
the arch/x86 maintainers <x86@...nel.org>,
One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Brian Gerst <brgerst@...il.com>,
Matthew Whitehead <tedheadster@...il.com>,
Henrique de Moraes Holschuh <hmh@....eng.br>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Cooper <andrew.cooper3@...rix.com>
Subject: Re: [PATCH v2 5/6] x86/xen: Add a Xen-specific sync_core()
implementation
On Fri, Dec 02, 2016 at 11:03:50AM -0800, Linus Torvalds wrote:
> I'd really rather rjust mark it noinline with a comment. That way the
> return from the function acts as the control flow change.
Something like below?
It boots in a guest but that doesn't mean anything.
> 'sync_core()' doesn't help for other CPU's anyway, you need to do the
> cross-call IPI. So worrying about other CPU's is *not* a valid reason
> to keep a "sync_core()" call.
Yeah, no, I'm not crazy about it either - I was just sanity-checking all
call sites of apply_alternatives(). But as you say, we would've gotten
much bigger problems if other CPUs would walk in there on us.
> Seriously, the only reason I can see for "sync_core()" really is:
>
> - some deep non-serialized MSR access or similar (ie things like
> firmware loading etc really might want it, and a mchine check might
> want it)
Yah, we do it in the #MC handler - apparently we need it there - and
in the microcode loader to tickle out the version of the microcode
currently applied into the MSR.
> The issues with modifying code while another CPU may be just about to
> access it is a separate issue. And as noted, "sync_core()" is not
> sufficient for that, you have to do a whole careful dance with
> single-byte debug instruction writes and then a final cross-call.
>
> See the whole "text_poke_bp()" and "text_poke()" for *that* whole
> dance. That's a much more complex thing from the normal
> apply_alternatives().
Yeah.
---
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5cb272a7a5a3..b1d0c35e6dcb 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -346,7 +346,6 @@ static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
local_irq_save(flags);
add_nops(instr + (a->instrlen - a->padlen), a->padlen);
- sync_core();
local_irq_restore(flags);
DUMP_BYTES(instr, a->instrlen, "%p: [%d:%d) optimized NOPs: ",
@@ -359,9 +358,12 @@ static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
* This implies that asymmetric systems where APs have less capabilities than
* the boot processor are not handled. Tough. Make sure you disable such
* features by hand.
+ *
+ * Marked "noinline" to cause control flow change and thus insn cache
+ * to refetch changed I$ lines.
*/
-void __init_or_module apply_alternatives(struct alt_instr *start,
- struct alt_instr *end)
+void __init_or_module noinline apply_alternatives(struct alt_instr *start,
+ struct alt_instr *end)
{
struct alt_instr *a;
u8 *instr, *replacement;
@@ -667,7 +669,6 @@ void *__init_or_module text_poke_early(void *addr, const void *opcode,
unsigned long flags;
local_irq_save(flags);
memcpy(addr, opcode, len);
- sync_core();
local_irq_restore(flags);
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Powered by blists - more mailing lists