lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 30 Sep 2010 08:31:39 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Changli Gao <xiaosuo@...il.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Alexey Kuznetsov <kuznet@....inr.ac.ru>,
	"Pekka Savola (ipv6)" <pekkas@...core.fi>,
	James Morris <jmorris@...ei.org>,
	Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
	Patrick McHardy <kaber@...sh.net>, netdev@...r.kernel.org
Subject: Re: [PATCH] net: code cleanups

Le jeudi 30 septembre 2010 à 14:09 +0800, Changli Gao a écrit :
> On Thu, Sep 30, 2010 at 1:16 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > Le jeudi 30 septembre 2010 à 10:24 +0800, Changli Gao a écrit :
> >> Compare operations are more readable, and compilers generate the same code
> >> for the both.
> >>
> >
> > You have a buggy compiler then.
> 
> gcc version 4.4.3 (Gentoo 4.4.3-r2 p1.2)
> 
>              rth = rcu_dereference(rth->dst.rt_next)) {
>                 if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
>                      ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
>                      (rth->fl.iif ^ iif) |
>     2f12:       44 3b 80 dc 00 00 00    cmp    0xdc(%rax),%r8d
>     2f19:       0f 85 a2 00 00 00       jne    2fc1 <ip_route_input_common+0x145
> >
>                      rth->fl.oif |
>     2f1f:       83 b8 d8 00 00 00 00    cmpl   $0x0,0xd8(%rax)
>     2f26:       0f 85 95 00 00 00       jne    2fc1 <ip_route_input_common+0x145
> >
>         tos &= IPTOS_RT_MASK;
>         hash = rt_hash(daddr, saddr, iif, rt_genid(net));
> 
>         for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
>              rth = rcu_dereference(rth->dst.rt_next)) {
>                 if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
>     2f2c:       44 3b b8 e4 00 00 00    cmp    0xe4(%rax),%r15d
>     2f33:       0f 85 88 00 00 00       jne    2fc1
> <ip_route_input_common+0x145>
>                      ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
>     2f39:       44 3b b0 e8 00 00 00    cmp    0xe8(%rax),%r14d
>     2f40:       75 7f                   jne    2fc1
> <ip_route_input_common+0x145>
> 
> 
> >
> > I know this code is ugly, but please keep it as is, dont add conditional
> > branches on hot paths.
> >
> 
> If the compiler doesn't generate conditional branches, we have to
> touch every necessary field of all the cache entries in one hash
> bucket. Is it better than condition branch? I think the compiler
> developers know it better.

Last famous words.

Are you aware of cache lines (64 bytes at least on typical cpus), and
that all fields are already in CPU L1 cache ? I (and others) worked hard
in the past.

> 
> And the compiler reorders the conditional branches, is it expected?
> 

Your compiler added conditional branches on a code not wanting them,
only because on _your_ cpu, these conditional branches might be cheap.

Now, try to compile for an i686 target and see the difference.

If there was no difference, your compiler would be _buggy_, because not
generating optimal assembly.

Here I get :

c141dda9:       8b 55 e8                mov    -0x18(%ebp),%edx
c141ddac:       8b 81 9c 00 00 00       mov    0x9c(%ecx),%eax
c141ddb2:       33 91 a0 00 00 00       xor    0xa0(%ecx),%edx
c141ddb8:       31 f0                   xor    %esi,%eax
c141ddba:       09 d0                   or     %edx,%eax
c141ddbc:       8b 55 e0                mov    -0x20(%ebp),%edx
c141ddbf:       33 91 94 00 00 00       xor    0x94(%ecx),%edx
c141ddc5:       09 d0                   or     %edx,%eax
c141ddc7:       0f b6 55 e7             movzbl -0x19(%ebp),%edx
c141ddcb:       0b 81 90 00 00 00       or     0x90(%ecx),%eax
c141ddd1:       32 91 a4 00 00 00       xor    0xa4(%ecx),%dl
c141ddd7:       0f b6 d2                movzbl %dl,%edx
c141ddda:       09 d0                   or     %edx,%eax
c141dddc:       0f 85 9d 00 00 00       jne    c141de7f <ip_route_input_common+0x1b4>

As you can see, only one conditional branch.

Your patch is not welcomed, thanks.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists