lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 Nov 2017 23:24:17 +0100
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking
 regression in 4.14-rc and 4.13.11]

On 14 November 2017 at 00:54, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Mon, Nov 13, 2017 at 3:30 PM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> So let's just rewrite that mnt_flags conversion that way, justr to get
>> gcc to generate the obvious code.
>
> Oh wow. I tried to do the same thing in fs/namespace.c where it does
> the reverse bit translation, and gcc makes a _horrible_ mess of it and
> actually makes the code much worse because for some reason the pattern
> doesn't trigger.

[trimming cc list]

Can you be more specific? That's not what I see with gcc 7.1. I've
found two blocks where I replaced the if's with the ternary
expressions, one in clone_mnt, one in do_mount. In clone_mnt, gcc-7.1
seems to do (first instruction is the unconditional |= MNT_LOCK_ATIME)

    10d7:       0d 00 00 04 00          or     $0x40000,%eax
    10dc:       89 43 30                mov    %eax,0x30(%rbx)
    10df:       a8 02                   test   $0x2,%al
    10e1:       74 08                   je     10eb <clone_mnt+0x9b>
    10e3:       0d 00 00 20 00          or     $0x200000,%eax
    10e8:       89 43 30                mov    %eax,0x30(%rbx)
    10eb:       a8 01                   test   $0x1,%al
    10ed:       74 08                   je     10f7 <clone_mnt+0xa7>
    10ef:       0d 00 00 10 00          or     $0x100000,%eax
    10f4:       89 43 30                mov    %eax,0x30(%rbx)

and after patching

    10cc:       0d 00 00 04 00          or     $0x40000,%eax
    10d1:       89 c2                   mov    %eax,%edx
    10d3:       c1 e2 10                shl    $0x10,%edx
    10d6:       81 e2 00 00 40 00       and    $0x400000,%edx
    10dc:       09 d0                   or     %edx,%eax
    10de:       89 c2                   mov    %eax,%edx
    10e0:       c1 e2 14                shl    $0x14,%edx
    10e3:       81 e2 00 00 20 00       and    $0x200000,%edx
    10e9:       09 c2                   or     %eax,%edx

(with a final store of %eax to 0x30(%rbx)). Either way it's four
instructions per flag, but I assume the one without the branches is
preferable.

Similarly, in do_mount, before we have

    3429:       44 89 f8                mov    %r15d,%eax
    342c:       83 c8 01                or     $0x1,%eax
    342f:       40 f6 c5 02             test   $0x2,%bpl
    3433:       44 0f 45 f8             cmovne %eax,%r15d
    3437:       44 89 f8                mov    %r15d,%eax
    343a:       83 c8 02                or     $0x2,%eax
    343d:       40 f6 c5 04             test   $0x4,%bpl
    3441:       44 0f 45 f8             cmovne %eax,%r15d

but after patching it does something like

    3425:       4d 89 fe                mov    %r15,%r14
    3428:       48 c1 ea 07             shr    $0x7,%rdx
    342c:       49 d1 ee                shr    %r14
    342f:       89 d0                   mov    %edx,%eax
    3431:       c1 e1 05                shl    $0x5,%ecx
    3434:       83 e0 08                and    $0x8,%eax
    3437:       41 83 e6 07             and    $0x7,%r14d
    343b:       41 09 c6                or     %eax,%r14d
    343e:       89 d0                   mov    %edx,%eax

which actually makes use of MS_{NOSUID,NODEV,NOEXEC} all being 1 bit
off from their MNT_ counterparts (witness the shr %r14 and the and
with 0x7), so there we also cut 37 bytes according to bloat-o-meter.

Now, it does seem that older (and not that old in absolute terms)
compilers may generate worse code after the transformation - the
'replace with shift-mask-or' pattern doesn't exist until 7.1. E.g.
with 5.4 we have

$ scripts/bloat-o-meter fs/namespace.o.{0,1}-5.4
add/remove: 0/0 grow/shrink: 2/0 up/down: 50/0 (50)
function                                     old     new   delta
do_mount                                    3153    3195     +42
clone_mnt                                    768     776      +8

5.4 compiles the "if() ..." construction roughly as 7.1, i.e. with
cmovs, but the ternary expressions are transformed into this
abomination

    336f:       48 89 da                mov    %rbx,%rdx
    3372:       83 e2 04                and    $0x4,%edx
    3375:       48 83 fa 01             cmp    $0x1,%rdx
    3379:       19 d2                   sbb    %edx,%edx
    337b:       f7 d2                   not    %edx
    337d:       83 e2 02                and    $0x2,%edx
    3380:       09 d0                   or     %edx,%eax
    3382:       48 89 da                mov    %rbx,%rdx
    3385:       83 e2 08                and    $0x8,%edx
    3388:       48 83 fa 01             cmp    $0x1,%rdx
    338c:       19 d2                   sbb    %edx,%edx
    338e:       f7 d2                   not    %edx
    3390:       83 e2 04                and    $0x4,%edx
    3393:       09 d0                   or     %edx,%eax

Was it something like this you saw?

Rasmus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ