[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100225153305.GC12635@Krystal>
Date: Thu, 25 Feb 2010 10:33:05 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Masami Hiramatsu <mhiramat@...hat.com>
Cc: Ingo Molnar <mingo@...e.hu>,
Frederic Weisbecker <fweisbec@...il.com>,
Ananth N Mavinakayanahalli <ananth@...ibm.com>,
lkml <linux-kernel@...r.kernel.org>,
systemtap <systemtap@...rces.redhat.com>,
DLE <dle-develop@...ts.sourceforge.net>,
Jim Keniston <jkenisto@...ibm.com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Christoph Hellwig <hch@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
"H. Peter Anvin" <hpa@...or.com>,
Anders Kaseorg <andersk@...lice.com>,
Tim Abbott <tabbott@...lice.com>,
Andi Kleen <andi@...stfloor.org>,
Jason Baron <jbaron@...hat.com>
Subject: Re: [PATCH -tip v3&10 07/18] x86: Add text_poke_smp for SMP cross
modifying code
* Masami Hiramatsu (mhiramat@...hat.com) wrote:
> Add generic text_poke_smp for SMP which uses stop_machine()
> to synchronize modifying code.
> This stop_machine() method is officially described at "7.1.3
> Handling Self- and Cross-Modifying Code" on the intel's
> software developer's manual 3A.
>
> Since stop_machine() can't protect code against NMI/MCE, this
> function can not modify those handlers. And also, this function
> is basically for modifying multibyte-single-instruction. For
> modifying multibyte-multi-instructions, we need another special
> trap & detour code.
>
> This code originaly comes from immediate values with stop_machine()
> version. Thanks Jason and Mathieu!
>
> Signed-off-by: Masami Hiramatsu <mhiramat@...hat.com>
> Cc: Mathieu Desnoyers <compudj@...stal.dyndns.org>
> Cc: Ananth N Mavinakayanahalli <ananth@...ibm.com>
> Cc: Ingo Molnar <mingo@...e.hu>
> Cc: Jim Keniston <jkenisto@...ibm.com>
> Cc: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
> Cc: Christoph Hellwig <hch@...radead.org>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Cc: H. Peter Anvin <hpa@...or.com>
> Cc: Anders Kaseorg <andersk@...lice.com>
> Cc: Tim Abbott <tabbott@...lice.com>
> Cc: Andi Kleen <andi@...stfloor.org>
> Cc: Jason Baron <jbaron@...hat.com>
> ---
>
> arch/x86/include/asm/alternative.h | 4 ++
> arch/x86/kernel/alternative.c | 60 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 63 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index f1e253c..b09ec55 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -165,10 +165,12 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
> * invalid instruction possible) or if the instructions are changed from a
> * consistent state to another consistent state atomically.
> * More care must be taken when modifying code in the SMP case because of
> - * Intel's errata.
> + * Intel's errata. text_poke_smp() takes care that errata, but still
> + * doesn't support NMI/MCE handler code modifying.
> * On the local CPU you need to be protected again NMI or MCE handlers seeing an
> * inconsistent instruction while you patch.
> */
> extern void *text_poke(void *addr, const void *opcode, size_t len);
> +extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
>
> #endif /* _ASM_X86_ALTERNATIVE_H */
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index e6ea034..635e4f4 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -7,6 +7,7 @@
> #include <linux/mm.h>
> #include <linux/vmalloc.h>
> #include <linux/memory.h>
> +#include <linux/stop_machine.h>
> #include <asm/alternative.h>
> #include <asm/sections.h>
> #include <asm/pgtable.h>
> @@ -572,3 +573,62 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> local_irq_restore(flags);
> return addr;
> }
> +
> +/*
> + * Cross-modifying kernel text with stop_machine().
> + * This code originally comes from immediate value.
> + */
> +static atomic_t stop_machine_first;
> +static int wrote_text;
> +
> +struct text_poke_params {
> + void *addr;
> + const void *opcode;
> + size_t len;
> +};
> +
> +static int __kprobes stop_machine_text_poke(void *data)
> +{
> + struct text_poke_params *tpp = data;
> +
> + if (atomic_dec_and_test(&stop_machine_first)) {
> + text_poke(tpp->addr, tpp->opcode, tpp->len);
> + smp_wmb(); /* Make sure other cpus see that this has run */
> + wrote_text = 1;
> + } else {
> + while (!wrote_text)
> + smp_rmb();
> + sync_core();
Hrm, there is a problem in there. The last loop, when wrote_text becomes
true, does not perform any smp_mb(), so you end up in a situation where
cpus in the "else" branch may never issue any memory barrier. I'd rather
do:
+static volatile int wrote_text;
...
+static int __kprobes stop_machine_text_poke(void *data)
+{
+ struct text_poke_params *tpp = data;
+
+ if (atomic_dec_and_test(&stop_machine_first)) {
+ text_poke(tpp->addr, tpp->opcode, tpp->len);
+ smp_wmb(); /* order text_poke stores before store to wrote_text */
+ wrote_text = 1;
+ } else {
+ while (!wrote_text)
+ cpu_relax();
+ smp_mb(); /* order wrote_text load before following execution */
+ }
If you don't like the "volatile int" definition of wrote_text, then we
should probably use the ACCESS_ONCE() macro instead.
Thanks,
Mathieu
> + }
> +
> + flush_icache_range((unsigned long)tpp->addr,
> + (unsigned long)tpp->addr + tpp->len);
> + return 0;
> +}
> +
> +/**
> + * text_poke_smp - Update instructions on a live kernel on SMP
> + * @addr: address to modify
> + * @opcode: source of the copy
> + * @len: length to copy
> + *
> + * Modify multi-byte instruction by using stop_machine() on SMP. This allows
> + * user to poke/set multi-byte text on SMP. Only non-NMI/MCE code modifying
> + * should be allowed, since stop_machine() does _not_ protect code against
> + * NMI and MCE.
> + *
> + * Note: Must be called under get_online_cpus() and text_mutex.
> + */
> +void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
> +{
> + struct text_poke_params tpp;
> +
> + tpp.addr = addr;
> + tpp.opcode = opcode;
> + tpp.len = len;
> + atomic_set(&stop_machine_first, 1);
> + wrote_text = 0;
> + stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
> + return addr;
> +}
> +
>
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America), Inc.
> Software Solutions Division
>
> e-mail: mhiramat@...hat.com
>
--
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists