linux-kernel - Re: [PATCH 2/3] Turn bitmap_set and bitmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170627070252.GB27359@bombadil.infradead.org>
Date:   Tue, 27 Jun 2017 00:02:52 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Rasmus Villemoes <linux@...musvillemoes.dk>
Cc:     linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Matthew Wilcox <mawilcox@...rosoft.com>
Subject: Re: [PATCH 2/3] Turn bitmap_set and bitmap_clear into memset when
 possible

On Wed, Jun 07, 2017 at 11:16:37PM +0200, Rasmus Villemoes wrote:
> On Wed, Jun 07 2017, Matthew Wilcox <willy@...radead.org> wrote:
> > @@ -319,6 +319,9 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start,
> >  {
> >  	if (__builtin_constant_p(nbits) && nbits == 1)
> >  		__set_bit(start, map);
> > +	else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) &&
> > +		 __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8))
> > +		memset(map + start / 8, 0xff, nbits / 8);
> >  	else
> 
> Isn't the pointer arithmetic wrong here? I think you need to cast map to
> (char*).

Good catch.  I've added a test case to test_bitmap.c and it catches
the problem.

> Do you have an example of how the generated code changes, both in the
> case of actual constants and a case where gcc can see that start and
> nbits are byte-aligned without knowing their actual values?

Here's an example of start == 0 and nbits being clearly a multiple of 8,
courtesy of btrfs's extent-io-tests.c.

Before:
  f1:   48 8b 5c 24 10          mov    0x10(%rsp),%rbx
  f6:   4c 8b 3c 24             mov    (%rsp),%r15
  fa:   31 f6                   xor    %esi,%esi
  fc:   44 8d 34 dd 00 00 00    lea    0x0(,%rbx,8),%r14d
 103:   00 
 104:   4c 8d 2c dd 00 00 00    lea    0x0(,%rbx,8),%r13
 10b:   00 
 10c:   4c 89 ff                mov    %r15,%rdi
 10f:   44 89 f2                mov    %r14d,%edx
 112:   e8 00 00 00 00          callq  117 <__test_eb_bitmaps+0x77>
                        113: R_X86_64_PC32      bitmap_set-0x4
 117:   31 d2                   xor    %edx,%edx
 119:   31 f6                   xor    %esi,%esi
 11b:   4c 89 e9                mov    %r13,%rcx
 11e:   4c 89 e7                mov    %r12,%rdi
 121:   e8 00 00 00 00          callq  126 <__test_eb_bitmaps+0x86>
                        122: R_X86_64_PC32      extent_buffer_bitmap_set-0x4

After:
  f1:   4c 8b 7c 24 10          mov    0x10(%rsp),%r15
  f6:   be ff 00 00 00          mov    $0xff,%esi
  fb:   48 89 df                mov    %rbx,%rdi
  fe:   4d 89 fe                mov    %r15,%r14
 101:   4e 8d 2c fd 00 00 00    lea    0x0(,%r15,8),%r13
 108:   00 
 109:   41 81 e6 ff ff ff 1f    and    $0x1fffffff,%r14d
 110:   4c 89 f2                mov    %r14,%rdx
 113:   e8 00 00 00 00          callq  118 <__test_eb_bitmaps+0x78>
                        114: R_X86_64_PC32      memset-0x4
 118:   48 8b 2c 24             mov    (%rsp),%rbp
 11c:   31 d2                   xor    %edx,%edx
 11e:   31 f6                   xor    %esi,%esi
 120:   4c 89 e9                mov    %r13,%rcx
 123:   48 89 ef                mov    %rbp,%rdi
 126:   e8 00 00 00 00          callq  12b <__test_eb_bitmaps+0x8b>
                        127: R_X86_64_PC32      extent_buffer_bitmap_set-0x4

It's actually slightly less efficient in the caller (although obviously
memset() is going to execute faster than bitmap_set()).  Partly because
x86 made some odd choices about the behaviour of an 8-bit move instruction
(it leaves the upper 24 bits intact, rather than zeroing them, so gcc
has to use a 32-bit move instruction to put 0xff into the second argument
to memset()), and partly because gcc is seeing us do:

	unsigned long len;
	len = len * 8 / 8;

so it's (very reasonably) transforming that into len &= 0x1fff'ffff'ffff'ffff;
That's partly a quirk of how this particular function is written.