[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20071119.175116.55102316.davem@davemloft.net>
Date: Mon, 19 Nov 2007 17:51:16 -0800 (PST)
From: David Miller <davem@...emloft.net>
To: clameter@....com
Cc: ak@...e.de, akpm@...ux-foundation.org, travis@....com,
mathieu.desnoyers@...ymtl.ca, linux-kernel@...r.kernel.org
Subject: Re: [rfc 00/45] [RFC] CPU ops and a rework of per cpu data
handling on x86_64
From: clameter@....com
Date: Mon, 19 Nov 2007 17:11:32 -0800
> Before:
>
> mov %gs:0x8,%rdx Get smp_processor_id
> mov tableoffset,%rax Get table base
> incq varoffset(%rax,%rdx,1) Perform the operation with a complex lookup
> adding the var offset
>
> An interrupt or a reschedule action can move the execution thread to another
> processor if interrupt or preempt is not disabled. Then the variable of
> the wrong processor may be updated in a racy way.
>
> After:
>
> incq %gs:varoffset(%rip)
>
> Single instruction that is safe from interrupts or moving of the execution
> thread. It will reliably operate on the current processors data area.
>
> Other platforms can also perform address relocation plus atomic ops on
> a memory location. Exploiting of the atomicity of instructions vs interrupts
> is therefore possible and will reduce the cpu op processing overhead.
>
> F.e on IA64 we have per cpu virtual mapping of the per cpu area. If
> we add an offset to the per cpu area variable address then we can guarantee
> that we always hit the per cpu areas local to a processor.
>
> Other platforms (SPARC?) have registers that can be used to form addresses.
> If the cpu area address is in one of those then atomic per cpu modifications
> can be generated for those platforms in the same way.
Although we have a per-cpu area base in a fixed global register
for addressing, the above isn't beneficial on sparc64 because
the atomic is much slower than doing a:
local_irq_disable();
nonatomic_percpu_memory_op();
local_irq_enable();
local_irq_{disable,enable}() together is about 18 cycles.
Just the cmpxchg() part of the atomic sequence is at least
32 cycles and requires a loop:
while (1) {
x = ld();
if (cmpxchg(x, op(x)))
break;
}
which bloats up the atomic version even more.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists