[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080814011340.GA30038@Krystal>
Date: Wed, 13 Aug 2008 21:13:40 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Andi Kleen <andi@...stfloor.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Steven Rostedt <rostedt@...dmis.org>,
Steven Rostedt <srostedt@...hat.com>,
Jeremy Fitzhardinge <jeremy@...p.org>,
LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
David Miller <davem@...emloft.net>,
Roland McGrath <roland@...hat.com>,
Ulrich Drepper <drepper@...hat.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Gregory Haskins <ghaskins@...ell.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
Clark Williams <williams@...hat.com>,
Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with
preemptible kernel and CPU hotplug
* H. Peter Anvin (hpa@...or.com) wrote:
> Mathieu Desnoyers wrote:
>> If a kernel thread is preempted in single-cpu mode right after the NOP
>> (nop
>> about to be turned into a lock prefix), then we CPU hotplug a CPU, and
>> then the
>> thread is scheduled back again, a SMP-unsafe atomic operation will be used
>> on
>> shared SMP variables, leading to corruption. No corruption would happen in
>> the
>> reverse case : going from SMP to UP is ok because we split a bit
>> instruction
>> into tiny pieces, which does not present this condition.
>> Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment
>> override prefix should fix this issue. Since the default of the atomic
>> instructions is to use the DS segment anyway, it should not affect the
>> behavior.
>
> I believe this should be okay. In 32-bit mode some of the security and
> hypervisor frameworks want to set segment limits, but I don't believe they
> ever would set DS and SS inconsistently, or that we'd handle a #GP versus
> an #SS differently (segment violations on the stack segment are #SS, not
> #GP.) To be 100% sure we'd have to pick apart the modr/m byte to figure
> out what the base register is but I think that's total overkill.
>
I guess some testing of this patch under an virtualized Linux would not
hurt. Anyone have a setup ready ? The test case is simple : Run a kernel
on a multi-CPU virtual guest.
export NR_CPUS=...
for a in `seq 1 $NR_CPUS`; do echo 0 > ./devices/system/cpu/cpu$a/online;done
> I have a vague notion that DS: prefixes came with a penalty on older CPUs,
> so we may want to do this only when CPU hotplug is enabled, to avoid
> penalizing older embedded systems.
>
> -hpa
Reading the "Intel Architecture Optimizations Manual" for older Intels :
http://developer.intel.com/design/pentium/manuals/242816.htm
Chapter 3.7 Prefixed Opcodes
The penality for instructions prefixed with other prefixes than 0x0f,
0x66 or 0x67 seems to be 1 added clock cycle to detect the prefix when
it cannot be paired.
Since we are choosing between the existing 0x90 nop followed by the
atomic instruction and this prefix applied to the atomic instruction,
we have to consider the penality cost of this nop. From the same manual,
the NOP is decoded into 1 micro-op.
Unless these architectures (386SX/DX, 486, Pentium Pro, Pentium MMX,
Pentium II) can execute more than 1 micro-op per cycle, I doubt the DS
prefix would cause any degradation compared to the 0x90 nop. And this
would free the lower stages of the pipeline from executing this NOP
micro-op.
I guess some quick performance tests with the modules I provide on my
website (URL in the patch header) could confirm or infirm this.
Actually, I just removed the dust from an old Pentium II, here are the
results. There is no performance overhead nor degradation.
NR_TESTS 10000000
test empty cycles : 200833494
test test 1-byte nop xadd cycles : 340000130
test test DS override prefix xadd cycles : 340000126 *
test test LOCK xadd cycles : 530000078
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 350.867
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr
bogomips : 690.17
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists