linux-kernel - Re: [PATCH] x86: slightly shorten __ticket_spin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 18 Dec 2009 08:36:14 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jan Beulich <JBeulich@...ell.com>
cc:	mingo@...e.hu, tglx@...utronix.de, hpa@...or.com,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86: slightly shorten __ticket_spin_trylock() (v3)



On Fri, 18 Dec 2009, Jan Beulich wrote:
>
> -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
> +static __always_inline u8 __ticket_spin_trylock(arch_spinlock_t *lock)
>  {
>  	int tmp, new;
>  
> @@ -87,8 +87,7 @@ static __always_inline int __ticket_spin
>  		     "jne 1f\n\t"
>  		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
>  		     "1:"
> -		     "sete %b1\n\t"
> -		     "movzbl %b1,%0\n\t"
> +		     "sete %b0\n\t"
>  		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
>  		     :
>  		     : "memory", "cc");
> @@ -127,7 +126,7 @@ static __always_inline void __ticket_spi

Btw, I looked at that earlier, and it's still pretty sub-optimal.

There's two problems - nobody actually uses the low-level code directly, 
it goes through a 

	if (__ticket_spinlock()) {
		..
		return 1;
	}
	return 0;

logic. Which means that regardless of what the asm does, the end result 
will still be something like this _after_ the asm:

        1:sete %al      # tmp

# 0 "" 2
#NO_APP
        testb   %al, %al        # tmp
        leave
        setne   %al     #, tmp66

ie you have that "sete" (in the asm) being then followed by testb/setne 
again (from the C code wrappers).

The other problem is that the compiler could actually generate better code 
if you leave it to it, so doing

        int tmp, new;

        tmp = ACCESS_ONCE(lock->slock);
        new = tmp + 0x100;
        asm volatile("cmpb %h0,%b0\n\t"
                     "jne 1f\n\t"
                     LOCK_PREFIX "cmpxchgw %w2,%1\n\t"
                     "1:"
                     "sete %b0\n\t"
                     : "=a" (tmp), "+m" (lock->slock)
                     : "r" (new), "0" (tmp)
                     : "memory", "cc");

        return tmp;

actually seems to result in better code:

	_raw_spin_trylock:
	        pushq   %rbp    #
	        movl    (%rdi), %eax    #* lock, D.17152
	        movq    %rsp, %rbp      #,
	        leal    256(%rax), %edx #, new
	#APP
	# 86 "/home/torvalds/v2.6/linux/arch/x86/include/asm/spinlock.h" 1
	        cmpb %ah,%al    # tmp
	        jne 1f
	        .section .smp_locks,"a"
	 .balign 8
	 .quad 661f
	.previous
	661:
	        lock; cmpxchgw %dx,(%rdi)       # new,* lock
	        1:sete %al      # tmp
	
	# 0 "" 2
	#NO_APP

Look - no "movzwl" at all at the beginning, because it turns out that the 
spinlock is a "unsigned int" and the "movl" to load the value pairs just 
fine with the "leal" that the compiler can do too.  And we didn't 
artificially constrain the second register to a byte register either (but 
the compiler still picked %dx, of course).

I dunno. I still don't get the feeling that any of this actually 
_matters_.

(Btw, the above isn't actually tested - I just edited the inline asm and 
looked at what gcc generated, I didn't _run_ any of it).

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/