lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1290160467.3034.33.camel@edumazet-laptop>
Date:	Fri, 19 Nov 2010 10:54:27 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Changli Gao <xiaosuo@...il.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Hagen Paul Pfeifer <hagen@...u.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next-2.6] filter: cleanup codes[] init

Le vendredi 19 novembre 2010 à 16:38 +0800, Changli Gao a écrit :

> I compared the asm code of sk_run_filter.

> As you see, an additional 'dec %edx' instruction is inserted.
> sk_chk_filter() only runs 1 times, I think we can afford the 'dec
> instruction' and 'dirty' code, but sk_run_filter() runs much often,
> this additional dec instruction isn't affordable.
> 

Maybe on your setup. By the way, the 

u32 f_k = fentry->k;

that David added in commit 57fe93b374a6b871

was much more a problem on arches with not enough registers.

x86_32 for example : compiler use a register (%esi on my gcc-4.5.1) to
store f_k, and more important A register is now stored in stack instead
of a cpu register.



On my compilers

 gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) 64bit
 gcc-4.5.1 (self compiled) 32bit

result code was the same, before and after patch

Most probably you have "CONFIG_CC_OPTIMIZE_FOR_SIZE=y" which
unfortunately is known to generate poor looking code.

 39b:	49 8d 14 c6          	lea    (%r14,%rax,8),%rdx
 39f:	66 83 3a 2d          	cmpw   $0x2d,(%rdx)
 3a3:	8b 42 04             	mov    0x4(%rdx),%eax             // f_k = fentry->k;
 3a6:	76 28                	jbe    3d0 <sk_run_filter+0x70>

 3d0:	0f b7 0a             	movzwl (%rdx),%ecx
 3d3:	ff 24 cd 00 00 00 00 	jmpq   *0x0(,%rcx,8)


32bit code:

 2e0:	8d 04 df             	lea    (%edi,%ebx,8),%eax
 2e3:	66 83 38 2d          	cmpw   $0x2d,(%eax)
 2e7:	8b 70 04             	mov    0x4(%eax),%esi  // f_k = fentry->k;
 2ea:	76 1c                	jbe    308 <sk_run_filter+0x58>

 308:	0f b7 10             	movzwl (%eax),%edx
 30b:	ff 24 95 38 00 00 00 	jmp    *0x38(,%edx,4)


DIV_X instruction :
 480:	8b 45 a4             	mov    -0x5c(%ebp),%eax
 483:	85 c0                	test   %eax,%eax
 485:	0f 84 9d fe ff ff    	je     328 <sk_run_filter+0x78>
 48b:	8b 45 ac             	mov    -0x54(%ebp),%eax   // A
 48e:	31 d2                	xor    %edx,%edx
 490:	f7 75 a4             	divl   -0x5c(%ebp)
 493:	89 45 ac             	mov    %eax,-0x54(%ebp)  // A
 496:	e9 85 fe ff ff       	jmp    320 <sk_run_filter+0x70>


I believe we should revert the u32 f_k = fentry->k; part

fentry->k as is fast as f_k if stored on stack, and avoids one
instruction if fentry->k is not needed.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ