lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 1 Sep 2015 11:24:22 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
	Rusty Russell <rusty@...tcorp.com.au>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Uros Bizjak <ubizjak@...il.com>
Subject: Re: [PATCH 1/2] x86/bitops: implement __test_bit


* Michael S. Tsirkin <mst@...hat.com> wrote:

> I applied this patch on top of mine:

Yeah, looks similar to the one I sent.

> -static inline int __variable_test_bit(long nr, const unsigned long *addr)
> -{
> -	int oldbit;
> -
> -	asm volatile("bt %2,%1\n\t"
> -		     "sbb %0,%0"
> -		     : "=r" (oldbit)
> -		     : "m" (*addr), "Ir" (nr));
> -
> -	return oldbit;
> -}

> And the code size went up:
> 
>    134836    2997    8372  146205   23b1d arch/x86/kvm/kvm-intel.ko  ->
>    134846    2997    8372  146215   23b27 arch/x86/kvm/kvm-intel.ko     
> 
>    342690   47640     441  390771   5f673 arch/x86/kvm/kvm.ko ->
>    342738   47640     441  390819   5f6a3 arch/x86/kvm/kvm.ko   
> 
> I tried removing  __always_inline, this had no effect.

But code size isn't the only factor.

Uros Bizjak pointed out that the reason GCC does not use the "BT reg,mem" 
instruction is that it's highly suboptimal even on recent microarchitectures, 
Sandy Bridge is listed as having a 10 cycles latency (!) for this instruction:

   http://www.agner.org/optimize/instruction_tables.pdf

this instruction had bad latency going back to Pentium 4 CPUs.

... so unless something changed in this area with Skylake I think using the 
__variable_test_bit() code of the kernel is a bad choice and looking at kernel 
size only is misleading.

It makes sense for atomics, but not for unlocked access.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ