lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070329011838.6e832615@werewolf-wl>
Date:	Thu, 29 Mar 2007 01:18:38 +0200
From:	"J.A. Magallón" <jamagallon@....com>
To:	"Linux-Kernel, " <linux-kernel@...r.kernel.org>
Subject: Inlining can be _very_bad...

Hi all...

I post this here as it can be of direct interest for kernel development
(as I recall many discussions about inlining yes or no...).

Testing other problems, I finally got this this issue: the same short
and stupid loop lasted from 3 to 5 times more if it was in main() than
if it was in an out-of-line function. The same (bad thing) happens if
the function is inlined.

The basic code is like this:

float	data[];

[inline] double one()
{
    double sum;
    sum = 0;
    for (i=0; i<SIZE; i++) sum += data[i];
    return sum;
}

int main()
{
    gettimeofday(&tv0,0);
    for (i=0; i<SIZE; i++)
        s0 += data[i];
    gettimeofday(&tv1,0);
    printf("T0: %6.2f ms\n",elap(tv0,tv1));
    gettimeofday(&tv0,0);
        s1 = one();
    gettimeofday(&tv1,0);
    printf("T1: %6.2f ms\n",elap(tv0,tv1));
}

The times if one() is not inlined (emt64, 2.33GHz):

apolo:~/e4> tst
T0: 1145.12 ms
S0: 268435456.00
T1: 457.19 ms
S1: 268435456.00

With one() inlined:

apolo:~/e4> tst
T0: 1200.52 ms
S0: 268435456.00
T1: 1200.14 ms
S1: 268435456.00

Looking at the assembler, the non-inlined version does:

.L2:
    cvtss2sd    (%rdx,%rax,4), %xmm0
    incq    %rax
    cmpq    $268435456, %rax
    addsd   %xmm0, %xmm1
    jne .L2

and the inlined

.L13:
    cvtss2sd    (%rdx,%rax,4), %xmm0
    incq    %rax
    cmpq    $268435456, %rax
    addsd   8(%rsp), %xmm0
    movsd   %xmm0, 8(%rsp)
    jne .L13

It looks like is updating the stack on each iteration...This is -march=opteron
code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.

tst.c and Makefile attached.

Nice, isn't it ? Please, probe where is my fault...

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT

Download attachment "Makefile" of type "application/octet-stream" (307 bytes)

View attachment "tst.c" of type "text/x-csrc" (898 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ