[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwiHFgXDxpQc279NUrO8X3E+UysBoVQ63s=y5oVu0jKysT7Rw@mail.gmail.com>
Date: Tue, 14 Nov 2017 23:24:17 +0100
From: Rasmus Villemoes <linux@...musvillemoes.dk>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking
regression in 4.14-rc and 4.13.11]
On 14 November 2017 at 00:54, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Mon, Nov 13, 2017 at 3:30 PM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> So let's just rewrite that mnt_flags conversion that way, justr to get
>> gcc to generate the obvious code.
>
> Oh wow. I tried to do the same thing in fs/namespace.c where it does
> the reverse bit translation, and gcc makes a _horrible_ mess of it and
> actually makes the code much worse because for some reason the pattern
> doesn't trigger.
[trimming cc list]
Can you be more specific? That's not what I see with gcc 7.1. I've
found two blocks where I replaced the if's with the ternary
expressions, one in clone_mnt, one in do_mount. In clone_mnt, gcc-7.1
seems to do (first instruction is the unconditional |= MNT_LOCK_ATIME)
10d7: 0d 00 00 04 00 or $0x40000,%eax
10dc: 89 43 30 mov %eax,0x30(%rbx)
10df: a8 02 test $0x2,%al
10e1: 74 08 je 10eb <clone_mnt+0x9b>
10e3: 0d 00 00 20 00 or $0x200000,%eax
10e8: 89 43 30 mov %eax,0x30(%rbx)
10eb: a8 01 test $0x1,%al
10ed: 74 08 je 10f7 <clone_mnt+0xa7>
10ef: 0d 00 00 10 00 or $0x100000,%eax
10f4: 89 43 30 mov %eax,0x30(%rbx)
and after patching
10cc: 0d 00 00 04 00 or $0x40000,%eax
10d1: 89 c2 mov %eax,%edx
10d3: c1 e2 10 shl $0x10,%edx
10d6: 81 e2 00 00 40 00 and $0x400000,%edx
10dc: 09 d0 or %edx,%eax
10de: 89 c2 mov %eax,%edx
10e0: c1 e2 14 shl $0x14,%edx
10e3: 81 e2 00 00 20 00 and $0x200000,%edx
10e9: 09 c2 or %eax,%edx
(with a final store of %eax to 0x30(%rbx)). Either way it's four
instructions per flag, but I assume the one without the branches is
preferable.
Similarly, in do_mount, before we have
3429: 44 89 f8 mov %r15d,%eax
342c: 83 c8 01 or $0x1,%eax
342f: 40 f6 c5 02 test $0x2,%bpl
3433: 44 0f 45 f8 cmovne %eax,%r15d
3437: 44 89 f8 mov %r15d,%eax
343a: 83 c8 02 or $0x2,%eax
343d: 40 f6 c5 04 test $0x4,%bpl
3441: 44 0f 45 f8 cmovne %eax,%r15d
but after patching it does something like
3425: 4d 89 fe mov %r15,%r14
3428: 48 c1 ea 07 shr $0x7,%rdx
342c: 49 d1 ee shr %r14
342f: 89 d0 mov %edx,%eax
3431: c1 e1 05 shl $0x5,%ecx
3434: 83 e0 08 and $0x8,%eax
3437: 41 83 e6 07 and $0x7,%r14d
343b: 41 09 c6 or %eax,%r14d
343e: 89 d0 mov %edx,%eax
which actually makes use of MS_{NOSUID,NODEV,NOEXEC} all being 1 bit
off from their MNT_ counterparts (witness the shr %r14 and the and
with 0x7), so there we also cut 37 bytes according to bloat-o-meter.
Now, it does seem that older (and not that old in absolute terms)
compilers may generate worse code after the transformation - the
'replace with shift-mask-or' pattern doesn't exist until 7.1. E.g.
with 5.4 we have
$ scripts/bloat-o-meter fs/namespace.o.{0,1}-5.4
add/remove: 0/0 grow/shrink: 2/0 up/down: 50/0 (50)
function old new delta
do_mount 3153 3195 +42
clone_mnt 768 776 +8
5.4 compiles the "if() ..." construction roughly as 7.1, i.e. with
cmovs, but the ternary expressions are transformed into this
abomination
336f: 48 89 da mov %rbx,%rdx
3372: 83 e2 04 and $0x4,%edx
3375: 48 83 fa 01 cmp $0x1,%rdx
3379: 19 d2 sbb %edx,%edx
337b: f7 d2 not %edx
337d: 83 e2 02 and $0x2,%edx
3380: 09 d0 or %edx,%eax
3382: 48 89 da mov %rbx,%rdx
3385: 83 e2 08 and $0x8,%edx
3388: 48 83 fa 01 cmp $0x1,%rdx
338c: 19 d2 sbb %edx,%edx
338e: f7 d2 not %edx
3390: 83 e2 04 and $0x4,%edx
3393: 09 d0 or %edx,%eax
Was it something like this you saw?
Rasmus
Powered by blists - more mailing lists