linux-kernel - [PATCH] Tell the world we gave up on pushing CC_OPTIMIZE_FOR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1351856461-3662-1-git-send-email-kirr@mns.spb.ru>
Date:	Fri,  2 Nov 2012 15:41:01 +0400
From:	Kirill Smelkov <kirr@....spb.ru>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org, Kirill Smelkov <kirr@....spb.ru>
Subject: [PATCH] Tell the world we gave up on pushing CC_OPTIMIZE_FOR_SIZE

 [continuing 281dc5c5 "Give up on pushing CC_OPTIMIZE_FOR_SIZE"]

Recently I've been beaten hard by CC_OPTIMIZE_FOR_SIZE=y on X86
performance-wise. The problem turned out to be for -Os gcc wants to
inline __builtin_memcpy, to which x86 memcpy directly refers,

    ---- 8< ---- arch/x86/include/asm/string_32.h
    #if (__GNUC__ >= 4)
    #define memcpy(t, f, n) __builtin_memcpy(t, f, n)

to "rep; movsb" which is several times slower compared to "rep; movsl".

For me this turned out in vivi driver, where memcpy is used to copy
lines with colorbars, and this is one of the most significant parts of
the workload:

    ---- 8< ---- drivers/media/platform/vivi.c
    static void vivi_fillbuff(struct vivi_dev *dev, struct vivi_buffer *buf)
    {
            ...

            for (h = 0; h < hmax; h++)
                    memcpy(vbuf + h * wmax * dev->pixelsize,
                           dev->line + (dev->mv_count % wmax) * dev->pixelsize,
                           wmax * dev->pixelsize);

Gcc insists on using movb, even if it knows dest and src alignment. For
example with gcc-4.4, -4.7 and yesterday's gcc trunk, for following function

    ---- 8< ----
    void doit(unsigned long *dst, unsigned long *src, unsigned n)
    {
        void *__d = __builtin_assume_aligned(dst, 4);
        void *__s = __builtin_assume_aligned(src, 4);

        __builtin_memcpy(__d, __s, n);
    }

it still wants to use movsb with -Os:

    00000000 <doit>:
       0:   55                      push   %ebp
       1:   89 e5                   mov    %esp,%ebp
       3:   57                      push   %edi
       4:   8b 4d 10                mov    0x10(%ebp),%ecx
       7:   56                      push   %esi
       8:   8b 7d 08                mov    0x8(%ebp),%edi
       b:   8b 75 0c                mov    0xc(%ebp),%esi
       e:   f3 a4                   rep movsb %ds:(%esi),%es:(%edi)
      10:   5e                      pop    %esi
      11:   5f                      pop    %edi
      12:   5d                      pop    %ebp
      13:   c3                      ret

and even if I change "n" to "4*n"...

On the other hand, with -O2, it generates call to memcpy, which at least
has rep; movsl inside it, and things works several times faster.

So tell people to not enable CC_OPTIMIZE_FOR_SIZE by default.

Signed-off-by: Kirill Smelkov <kirr@....spb.ru>
---
 init/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 6fdd6e3..6a448d5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1119,7 +1119,7 @@ config CC_OPTIMIZE_FOR_SIZE
 	  Enabling this option will pass "-Os" instead of "-O2" to gcc
 	  resulting in a smaller kernel.
 
-	  If unsure, say Y.
+	  If unsure, say N.
 
 config SYSCTL
 	bool
-- 
1.8.0.316.g291341c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/