[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1290160467.3034.33.camel@edumazet-laptop>
Date: Fri, 19 Nov 2010 10:54:27 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: Changli Gao <xiaosuo@...il.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Hagen Paul Pfeifer <hagen@...u.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next-2.6] filter: cleanup codes[] init
Le vendredi 19 novembre 2010 à 16:38 +0800, Changli Gao a écrit :
> I compared the asm code of sk_run_filter.
> As you see, an additional 'dec %edx' instruction is inserted.
> sk_chk_filter() only runs 1 times, I think we can afford the 'dec
> instruction' and 'dirty' code, but sk_run_filter() runs much often,
> this additional dec instruction isn't affordable.
>
Maybe on your setup. By the way, the
u32 f_k = fentry->k;
that David added in commit 57fe93b374a6b871
was much more a problem on arches with not enough registers.
x86_32 for example : compiler use a register (%esi on my gcc-4.5.1) to
store f_k, and more important A register is now stored in stack instead
of a cpu register.
On my compilers
gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) 64bit
gcc-4.5.1 (self compiled) 32bit
result code was the same, before and after patch
Most probably you have "CONFIG_CC_OPTIMIZE_FOR_SIZE=y" which
unfortunately is known to generate poor looking code.
39b: 49 8d 14 c6 lea (%r14,%rax,8),%rdx
39f: 66 83 3a 2d cmpw $0x2d,(%rdx)
3a3: 8b 42 04 mov 0x4(%rdx),%eax // f_k = fentry->k;
3a6: 76 28 jbe 3d0 <sk_run_filter+0x70>
3d0: 0f b7 0a movzwl (%rdx),%ecx
3d3: ff 24 cd 00 00 00 00 jmpq *0x0(,%rcx,8)
32bit code:
2e0: 8d 04 df lea (%edi,%ebx,8),%eax
2e3: 66 83 38 2d cmpw $0x2d,(%eax)
2e7: 8b 70 04 mov 0x4(%eax),%esi // f_k = fentry->k;
2ea: 76 1c jbe 308 <sk_run_filter+0x58>
308: 0f b7 10 movzwl (%eax),%edx
30b: ff 24 95 38 00 00 00 jmp *0x38(,%edx,4)
DIV_X instruction :
480: 8b 45 a4 mov -0x5c(%ebp),%eax
483: 85 c0 test %eax,%eax
485: 0f 84 9d fe ff ff je 328 <sk_run_filter+0x78>
48b: 8b 45 ac mov -0x54(%ebp),%eax // A
48e: 31 d2 xor %edx,%edx
490: f7 75 a4 divl -0x5c(%ebp)
493: 89 45 ac mov %eax,-0x54(%ebp) // A
496: e9 85 fe ff ff jmp 320 <sk_run_filter+0x70>
I believe we should revert the u32 f_k = fentry->k; part
fentry->k as is fast as f_k if stored on stack, and avoids one
instruction if fentry->k is not needed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists