[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070812185501.GA4480@Krystal>
Date: Sun, 12 Aug 2007 14:55:01 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Arjan van de Ven <arjan@...radead.org>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 0/8] Immediate Values
* Arjan van de Ven (arjan@...radead.org) wrote:
>
> On Sun, 2007-08-12 at 11:07 -0400, Mathieu Desnoyers wrote:
> > Hi Andrew,
> >
> > Here is the latest version of immediate values. It applies on 2.6.23-rc2-mm2 in
> > this order:
>
>
> I have a concern; you seem to be patching potentially "live" code....
>
> there are basically two options
> 1) you run the risk of triple faulting (patching an instruction while
> some other core/cpu may be decoding it may cause a triple fault)
> 2) you do an IPI to all other cpus and prevent them from executing any
> code except a small loop during the patching... this is expensive.
>
> To be honest, neither sound very attractive to me ;(
>
Yup, the concern is appropriate. That's why I dealt with it in the
"Immediate Values - i386 Optimization" patch. (I guess your concern
is specific to i386, x86_64 and ia64).
I have currently only implemented the i386 optimization, but x86_64 and
ia64 should be similar.
The triple fault in question is discussed in Intel's errata under the
title "Centrino Duo Processor Technology Specification Update, AH33.
Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
Instruction Execution Results." (if you refer to something else, please
tell me).
I discuss thoroughly the algorithm I use in the patch comments, which is
none of the two options you point out.
Quoting the patch:
"Overall design
The algorithm proposed by Intel applies not so well in kernel context: it
would imply disabling interrupts and looping on every CPUs while modifying
the code and would not support instrumentation of code called from interrupt
sources that cannot be disabled.
Therefore, we use a different algorithm to respect Intel's erratum (see the
quoted discussion above). We make sure that no CPU sees an out-of-date copy
of a pre-fetched instruction by 1 - using a breakpoint, which skips the
instruction that is going to be modified, 2 - issuing an IPI to every CPU to
execute a sync_core(), to make sure that even when the breakpoint is removed,
no cpu could possibly still have the out-of-date copy of the instruction,
modify the now unused 2nd byte of the instruction, and then put back the
original 1st byte of the instruction.
It has exactly the same intent as the algorithm proposed by Intel, but
it has less side-effects, scales better and supports NMI, SMI and MCE."
Moreover, just to be cautious, I arrange the alignment of instructions
to modify so their updates will be atomic. Therefore, a 5 bytes movl
will always be aligned on 4 bytes boundaries - 1, so the immediate value
that follows the 1 byte opcode will be itself aligned on 4 bytes
boundaries.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists