linux-kernel - Re: [PATCH] Reduce the number of expensive division instructions done by _parse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7095.1328809567@redhat.com>
Date:	Thu, 09 Feb 2012 17:46:07 +0000
From:	David Howells <dhowells@...hat.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	dhowells@...hat.com, Eric Dumazet <eric.dumazet@...il.com>,
	adobriyan@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Reduce the number of expensive division instructions done by _parse_integer()

Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> Looking at the code generated, the "val >> 60" thing actually does
> generate a shift, and at least on x86-64, the attached patch generates
> better code.

On fixed-size instruction arches, the runtime shift is probably the better
option, as simply loading 64-bit large constant would take likely take at
least four instructions - and might involve a shift anyway.  On the other
hand, it seems the compiler can optimise your suggestion fairly well.  In both
cases, the 64-bit arithmetic can be reduced to 32-bit arithmetic on the MSW
only on 32-bit arches.

On x86_64 we have:

  400649:       48 89 d8                mov    %rbx,%rax
  40064c:       48 c1 e8 3c             shr    $0x3c,%rax
  400650:       48 85 c0                test   %rax,%rax
  400653:       75 52                   jne    4006a7 <_parse_integer+0xa7>

And on i386 we have:

 8048532:       8b 54 24 14             mov    0x14(%esp),%edx
 ...
 8048538:       89 c7                   mov    %eax,%edi
 804853a:       c1 ea 1c                shr    $0x1c,%edx
 804853d:       85 d2                   test   %edx,%edx
 804853f:       75 79                   jne    80485ba <_parse_integer+0xda>

With your code, we have on x86_64:

  40062d:       49 bf 00 00 00 00 00    movabs $0xf000000000000000,%r15
  400634:       00 00 f0 
  ...
  400659:       4c 85 fb                test   %r15,%rbx
  40065c:       75 59                   jne    4006b7 <_parse_integer+0xb7>

And on i386:

 804853c:       89 c7                   mov    %eax,%edi
 804853e:       f7 44 24 1c 00 00 00    testl  $0xf0000000,0x1c(%esp)
 8048545:       f0 
 8048546:       75 79                   jne    80485c1 <_parse_integer+0xe1>

But it will work too.  And I like the pointer indirection removal as well.

I'm not sure there's a lot to choose between them, though I prefer mine as I
think it produces slightly smaller code.

Want me to wrap these changes up with my patch description?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/