netdev - Re: [PATCH net-next] arch_fast_hash: avoid indirect function calls and implement hash in asm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1417721526.5386.39.camel@localhost>
Date:	Thu, 04 Dec 2014 20:32:06 +0100
From:	Hannes Frederic Sowa <hannes@...essinduktion.org>
To:	Jay Vosburgh <jay.vosburgh@...onical.com>
Cc:	netdev@...r.kernel.org, Herbert Xu <herbert@...dor.apana.org.au>,
	Thomas Graf <tgraf@...g.ch>,
	Daniel Borkmann <dborkman@...hat.com>,
	Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net-next] arch_fast_hash: avoid indirect function calls
 and implement hash in asm

Hi Jay,

On Do, 2014-12-04 at 11:27 -0800, Jay Vosburgh wrote:
> Hannes Frederic Sowa <hannes@...essinduktion.org> wrote:
> 
> >By default the arch_fast_hash hashing function pointers are initialized
> >to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get
> >updated to the CRC32 ones. This dispatching scheme incurs a function
> >pointer lookup and indirect call for every hashing operation.
> >
> >To keep the number of clobbered registers short the hashing primitives
> >are implemented in assembler. This makes it easier to do the dispatch
> >by alternative_call.
> 
> 	I have tested this on the same system that panicked with the
> original (now reverted) implementation (commit e5a2c8999576 "fast_hash:
> avoid indirect function calls"), and it functions correctly and does not
> panic.
> 
> 	I looked at the disassembly, and, as a data point, on a
> non-SSE4.2 system, the code generated is not as efficient as Hannes'
> original test patch, found here:
> 
> http://comments.gmane.org/gmane.linux.network/338430
> 
> 	which produced code as follows:
> 
> 0xffffffffa00b6bb9 <ovs_flow_tbl_insert+0xb9>:  mov    %r15,0x348(%r14)
> 0xffffffffa00b6bc0 <ovs_flow_tbl_insert+0xc0>:  movzwl 0x28(%r15),%ecx
> 0xffffffffa00b6bc5 <ovs_flow_tbl_insert+0xc5>:  movzwl 0x2a(%r15),%esi
> 0xffffffffa00b6bca <ovs_flow_tbl_insert+0xca>:  movzwl %cx,%eax
> 0xffffffffa00b6bcd <ovs_flow_tbl_insert+0xcd>:  sub    %ecx,%esi
> 0xffffffffa00b6bcf <ovs_flow_tbl_insert+0xcf>:  lea    0x38(%r14,%rax,1),%rdi
> 0xffffffffa00b6bd4 <ovs_flow_tbl_insert+0xd4>:  sar    $0x2,%esi
> 0xffffffffa00b6bd7 <ovs_flow_tbl_insert+0xd7>:  callq  0xffffffff813a7810 <__jhash2>
> 0xffffffffa00b6bdc <ovs_flow_tbl_insert+0xdc>:  mov    %eax,0x30(%r14)
> 0xffffffffa00b6be0 <ovs_flow_tbl_insert+0xe0>:  mov    (%rbx),%r13
> 0xffffffffa00b6be3 <ovs_flow_tbl_insert+0xe3>:  mov    %r14,%rsi
> 0xffffffffa00b6be6 <ovs_flow_tbl_insert+0xe6>:  mov    %r13,%rdi
> 0xffffffffa00b6be9 <ovs_flow_tbl_insert+0xe9>:  callq  0xffffffffa00b61a0 <table_instance_insert>
> 
> 	This patch's code ends up as follows:
> 
> 0xffffffffa01b5a57 <ovs_flow_tbl_insert+0xb7>:	mov    %r15,0x348(%rcx)
> 0xffffffffa01b5a5e <ovs_flow_tbl_insert+0xbe>:	movzwl 0x28(%r15),%eax
> 0xffffffffa01b5a63 <ovs_flow_tbl_insert+0xc3>:	movzwl 0x2a(%r15),%esi
> 0xffffffffa01b5a68 <ovs_flow_tbl_insert+0xc8>:	movzwl %ax,%edx
> 0xffffffffa01b5a6b <ovs_flow_tbl_insert+0xcb>:	sub    %eax,%esi
> 0xffffffffa01b5a6d <ovs_flow_tbl_insert+0xcd>:	lea    0x38(%rcx,%rdx,1),%rdi
> 0xffffffffa01b5a72 <ovs_flow_tbl_insert+0xd2>:	xor    %edx,%edx
> 0xffffffffa01b5a74 <ovs_flow_tbl_insert+0xd4>:	sar    $0x2,%esi
> 0xffffffffa01b5a77 <ovs_flow_tbl_insert+0xd7>:	callq  0xffffffff813ae9f0 <__jhash_trampoline>
> 0xffffffffa01b5a7c <ovs_flow_tbl_insert+0xdc>:	mov    %eax,0x30(%rcx)
> 0xffffffffa01b5a7f <ovs_flow_tbl_insert+0xdf>:	mov    (%rbx),%r13
> 0xffffffffa01b5a82 <ovs_flow_tbl_insert+0xe2>:	mov    %rcx,%rsi
> 0xffffffffa01b5a85 <ovs_flow_tbl_insert+0xe5>:	mov    %r13,%rdi
> 0xffffffffa01b5a88 <ovs_flow_tbl_insert+0xe8>:	callq  0xffffffffa01b5030 <table_instance_insert>
> 
> 0xffffffff813ae9f0 <__jhash_trampoline>:	push   %rcx
> 0xffffffff813ae9f1 <__jhash_trampoline+0x1>:	push   %r8
> 0xffffffff813ae9f3 <__jhash_trampoline+0x3>:	push   %r9
> 0xffffffff813ae9f5 <__jhash_trampoline+0x5>:	push   %r10
> 0xffffffff813ae9f7 <__jhash_trampoline+0x7>:	push   %r11
> 0xffffffff813ae9f9 <__jhash_trampoline+0x9>:	callq  0xffffffff813ae8a0 <__jhash>
> 0xffffffff813ae9fe <__jhash_trampoline+0xe>:	pop    %r11
> 0xffffffff813aea00 <__jhash_trampoline+0x10>:	pop    %r10
> 0xffffffff813aea02 <__jhash_trampoline+0x12>:	pop    %r9
> 0xffffffff813aea04 <__jhash_trampoline+0x14>:	pop    %r8
> 0xffffffff813aea06 <__jhash_trampoline+0x16>:	pop    %rcx
> 0xffffffff813aea07 <__jhash_trampoline+0x17>:	retq   
> 
> 	In any event, this new patch does work correctly in my test that
> originally failed, and it's debatable how much optimizing for old
> systems is worthwhile.

Yes, that is expected. I also don't have a good idea on how to improve
the hashing on non-SSE4.2 systems in a reasonable amount of time.

> 	I only tested the non-SSE4.2 (i.e., old system) portion on
> x86_64.

I tried every possible setup this time, especially with openvswitch. I
covered ia32 with and without SSE4.2 as well as x86_64 and it always
behaved correctly. Last time the problem was that the static inline
didn't become a function in OVS, but during the testing with rhashtable
it got synthesized into a normal C call because of the indirect
reference.

Thanks a lot,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html