linux-kernel - Re: [PATCH 1/2] x86/bitops: implement __test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150901155824-mutt-send-email-mst@redhat.com>
Date:	Tue, 1 Sep 2015 18:03:27 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
	Rusty Russell <rusty@...tcorp.com.au>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Uros Bizjak <ubizjak@...il.com>
Subject: Re: [PATCH 1/2] x86/bitops: implement __test_bit

On Tue, Sep 01, 2015 at 01:39:42PM +0200, Ingo Molnar wrote:
> 
> * Michael S. Tsirkin <mst@...hat.com> wrote:
> 
> > On Tue, Sep 01, 2015 at 11:24:22AM +0200, Ingo Molnar wrote:
> > > 
> > > * Michael S. Tsirkin <mst@...hat.com> wrote:
> > > 
> > > > I applied this patch on top of mine:
> > > 
> > > Yeah, looks similar to the one I sent.
> > > 
> > > > -static inline int __variable_test_bit(long nr, const unsigned long *addr)
> > > > -{
> > > > -	int oldbit;
> > > > -
> > > > -	asm volatile("bt %2,%1\n\t"
> > > > -		     "sbb %0,%0"
> > > > -		     : "=r" (oldbit)
> > > > -		     : "m" (*addr), "Ir" (nr));
> > > > -
> > > > -	return oldbit;
> > > > -}
> > > 
> > > > And the code size went up:
> > > > 
> > > >    134836    2997    8372  146205   23b1d arch/x86/kvm/kvm-intel.ko  ->
> > > >    134846    2997    8372  146215   23b27 arch/x86/kvm/kvm-intel.ko     
> > > > 
> > > >    342690   47640     441  390771   5f673 arch/x86/kvm/kvm.ko ->
> > > >    342738   47640     441  390819   5f6a3 arch/x86/kvm/kvm.ko   
> > > > 
> > > > I tried removing  __always_inline, this had no effect.
> > > 
> > > But code size isn't the only factor.
> > > 
> > > Uros Bizjak pointed out that the reason GCC does not use the "BT reg,mem" 
> > > instruction is that it's highly suboptimal even on recent microarchitectures, 
> > > Sandy Bridge is listed as having a 10 cycles latency (!) for this instruction:
> > > 
> > >    http://www.agner.org/optimize/instruction_tables.pdf
> > > 
> > > this instruction had bad latency going back to Pentium 4 CPUs.
> > > 
> > > ... so unless something changed in this area with Skylake I think using the 
> > > __variable_test_bit() code of the kernel is a bad choice and looking at kernel 
> > > size only is misleading.
> > > 
> > > It makes sense for atomics, but not for unlocked access.
> > > 
> > > Thanks,
> > > 
> > > 	Ingo
> > 
> > Hmm - so do you take back the ack?
> 
> I have no strong feelings either way, it simply strikes me as misguided to 
> explicitly optimize for something that is listed as a high overhead instruction.
> 
> Assuming it really is high overhead:
> 
> > I compared this:
> > int main(int argc, char **argv)
> > {
> > 
> >         long long int i;
> >         const unsigned long addr = 0;
> >         for (i = 0; i < 1000000000ull; ++i) {
> >                 asm volatile("");
> >                 if (__variable_test_bit(1, &addr))
> >                 asm volatile("");
> >         }
> >         return 0;
> > }
> > 
> > with the __constant_test_bit variant.
> > 
> > __constant_test_bit code does appear to be slower on an i7 processor.


Ouch. I meant the reverse:

 [mst@...in test]$ diff a.c b.c
 31c31
 <               if (__variable_test_bit(1, &addr))
 ---
 >               if (__constant_test_bit(1, &addr))

[mst@...in test]$ gcc -Wall -O2 a.c; time ./a.out

real    0m0.532s
user    0m0.531s
sys     0m0.000s
[mst@...in test]$ gcc -Wall -O2 b.c; time ./a.out

real    0m0.517s
user    0m0.517s
sys     0m0.000s


So __constant_test_bit is faster even though it's using more
instructions
$ gcc -Wall -O2 a.c; -objdump -ld ./a.out


08048414 <main>:
main():
 8048414:	8d 4c 24 04          	lea    0x4(%esp),%ecx
 8048418:	83 e4 f8             	and    $0xfffffff8,%esp
 804841b:	ff 71 fc             	pushl  -0x4(%ecx)
 804841e:	55                   	push   %ebp
 804841f:	89 e5                	mov    %esp,%ebp
 8048421:	51                   	push   %ecx
 8048422:	83 ec 14             	sub    $0x14,%esp
 8048425:	c7 45 ec 00 00 00 00 	movl   $0x0,-0x14(%ebp)
 804842c:	c7 45 f0 00 00 00 00 	movl   $0x0,-0x10(%ebp)
 8048433:	c7 45 f4 00 00 00 00 	movl   $0x0,-0xc(%ebp)
 804843a:	eb 1a                	jmp    8048456 <main+0x42>
 804843c:	8d 45 ec             	lea    -0x14(%ebp),%eax
 804843f:	50                   	push   %eax
 8048440:	6a 01                	push   $0x1
 8048442:	e8 b4 ff ff ff       	call   80483fb <__variable_test_bit>
 8048447:	83 c4 08             	add    $0x8,%esp
 804844a:	85 c0                	test   %eax,%eax
 804844c:	74 00                	je     804844e <main+0x3a>
 804844e:	83 45 f0 01          	addl   $0x1,-0x10(%ebp)
 8048452:	83 55 f4 00          	adcl   $0x0,-0xc(%ebp)
 8048456:	8b 45 f0             	mov    -0x10(%ebp),%eax
 8048459:	8b 55 f4             	mov    -0xc(%ebp),%edx
 804845c:	83 fa 00             	cmp    $0x0,%edx
 804845f:	72 db                	jb     804843c <main+0x28>
 8048461:	83 fa 00             	cmp    $0x0,%edx
 8048464:	77 07                	ja     804846d <main+0x59>
 8048466:	3d ff c9 9a 3b       	cmp    $0x3b9ac9ff,%eax
 804846b:	76 cf                	jbe    804843c <main+0x28>
 804846d:	b8 00 00 00 00       	mov    $0x0,%eax
 8048472:	8b 4d fc             	mov    -0x4(%ebp),%ecx
 8048475:	c9                   	leave  
 8048476:	8d 61 fc             	lea    -0x4(%ecx),%esp
 8048479:	c3                   	ret    
 804847a:	66 90                	xchg   %ax,%ax
 804847c:	66 90                	xchg   %ax,%ax
 804847e:	66 90                	xchg   %ax,%ax

$ gcc -Wall -O2 b.c; -objdump -ld ./a.out

080483fb <main>:
main():
 80483fb:	8d 4c 24 04          	lea    0x4(%esp),%ecx
 80483ff:	83 e4 f8             	and    $0xfffffff8,%esp
 8048402:	ff 71 fc             	pushl  -0x4(%ecx)
 8048405:	55                   	push   %ebp
 8048406:	89 e5                	mov    %esp,%ebp
 8048408:	51                   	push   %ecx
 8048409:	83 ec 24             	sub    $0x24,%esp
 804840c:	c7 45 e4 00 00 00 00 	movl   $0x0,-0x1c(%ebp)
 8048413:	c7 45 f0 00 00 00 00 	movl   $0x0,-0x10(%ebp)
 804841a:	c7 45 f4 00 00 00 00 	movl   $0x0,-0xc(%ebp)
 8048421:	eb 44                	jmp    8048467 <main+0x6c>
 8048423:	c7 45 ec 01 00 00 00 	movl   $0x1,-0x14(%ebp)
 804842a:	8d 45 e4             	lea    -0x1c(%ebp),%eax
 804842d:	89 45 e8             	mov    %eax,-0x18(%ebp)
 8048430:	8b 45 ec             	mov    -0x14(%ebp),%eax
 8048433:	c1 f8 05             	sar    $0x5,%eax
 8048436:	8d 14 85 00 00 00 00 	lea    0x0(,%eax,4),%edx
 804843d:	8b 45 e8             	mov    -0x18(%ebp),%eax
 8048440:	01 d0                	add    %edx,%eax
 8048442:	8b 10                	mov    (%eax),%edx
 8048444:	8b 45 ec             	mov    -0x14(%ebp),%eax
 8048447:	83 e0 1f             	and    $0x1f,%eax
 804844a:	89 c1                	mov    %eax,%ecx
 804844c:	d3 ea                	shr    %cl,%edx
 804844e:	89 d0                	mov    %edx,%eax
 8048450:	83 e0 01             	and    $0x1,%eax
 8048453:	85 c0                	test   %eax,%eax
 8048455:	0f 95 c0             	setne  %al
 8048458:	0f b6 c0             	movzbl %al,%eax
 804845b:	85 c0                	test   %eax,%eax
 804845d:	74 00                	je     804845f <main+0x64>
 804845f:	83 45 f0 01          	addl   $0x1,-0x10(%ebp)
 8048463:	83 55 f4 00          	adcl   $0x0,-0xc(%ebp)
 8048467:	8b 45 f0             	mov    -0x10(%ebp),%eax
 804846a:	8b 55 f4             	mov    -0xc(%ebp),%edx
 804846d:	83 fa 00             	cmp    $0x0,%edx
 8048470:	72 b1                	jb     8048423 <main+0x28>
 8048472:	83 fa 00             	cmp    $0x0,%edx
 8048475:	77 07                	ja     804847e <main+0x83>
 8048477:	3d ff c9 9a 3b       	cmp    $0x3b9ac9ff,%eax
 804847c:	76 a5                	jbe    8048423 <main+0x28>
 804847e:	b8 00 00 00 00       	mov    $0x0,%eax
 8048483:	83 c4 24             	add    $0x24,%esp
 8048486:	59                   	pop    %ecx
 8048487:	5d                   	pop    %ebp
 8048488:	8d 61 fc             	lea    -0x4(%ecx),%esp
 804848b:	c3                   	ret    
 804848c:	66 90                	xchg   %ax,%ax
 804848e:	66 90                	xchg   %ax,%ax


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/