lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 19 Nov 2007 17:51:16 -0800 (PST)
From:	David Miller <davem@...emloft.net>
To:	clameter@....com
Cc:	ak@...e.de, akpm@...ux-foundation.org, travis@....com,
	mathieu.desnoyers@...ymtl.ca, linux-kernel@...r.kernel.org
Subject: Re: [rfc 00/45] [RFC] CPU ops and a rework of per cpu data
 handling on x86_64

From: clameter@....com
Date: Mon, 19 Nov 2007 17:11:32 -0800

> Before:
> 
> mov    %gs:0x8,%rdx		Get smp_processor_id
> mov    tableoffset,%rax		Get table base
> incq   varoffset(%rax,%rdx,1)	Perform the operation with a complex lookup
> 				adding the var offset
> 
> An interrupt or a reschedule action can move the execution thread to another
> processor if interrupt or preempt is not disabled. Then the variable of
> the wrong processor may be updated in a racy way.
> 
> After:
> 
> incq   %gs:varoffset(%rip)
> 
> Single instruction that is safe from interrupts or moving of the execution
> thread. It will reliably operate on the current processors data area.
> 
> Other platforms can also perform address relocation plus atomic ops on
> a memory location. Exploiting of the atomicity of instructions vs interrupts
> is therefore possible and will reduce the cpu op processing overhead.
> 
> F.e on IA64 we have per cpu virtual mapping of the per cpu area. If
> we add an offset to the per cpu area variable address then we can guarantee
> that we always hit the per cpu areas local to a processor.
> 
> Other platforms (SPARC?) have registers that can be used to form addresses.
> If the cpu area address is in one of those then atomic per cpu modifications
> can be generated for those platforms in the same way.

Although we have a per-cpu area base in a fixed global register
for addressing, the above isn't beneficial on sparc64 because
the atomic is much slower than doing a:

	local_irq_disable();
	nonatomic_percpu_memory_op();
	local_irq_enable();

local_irq_{disable,enable}() together is about 18 cycles.
Just the cmpxchg() part of the atomic sequence is at least
32 cycles and requires a loop:

	while (1) {
		x = ld();
		if (cmpxchg(x, op(x)))
			break;
	}

which bloats up the atomic version even more.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ