netdev - Re: [PATCH net-next] codel: use Newton method instead of sqrt() and divides

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1336859324.31653.1385.camel@edumazet-glaptop>
Date:	Sat, 12 May 2012 23:48:44 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	dave.taht@...ferbloat.net, netdev@...r.kernel.org,
	nichols@...lere.com, van@...lere.net, codel@...ts.bufferbloat.net,
	ycheng@...gle.com, mattmathis@...gle.com, therbert@...gle.com,
	shemminger@...tta.com, nanditad@...gle.com
Subject: Re: [PATCH net-next] codel: use Newton method instead of sqrt()
 and divides

On Sat, 2012-05-12 at 16:45 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Sat, 12 May 2012 22:40:56 +0200
> 
> > 24 bit of precision for the reciprocal value is more than enough (Van
> > suggested 16 bits in fact), so we have actually room for 7 bits if
> > needed.
> 
> Using a u16 would also work for me.

I tried it but it gives noticeable errors for count > 16000, and no
speed gain.


count=16525 scale=0 sqrt(scaled_count)=2056 reciprocal(sq)=1fe0200
Newton=235
  interval/sqrt(16525) = 
	777909 (float compute)
	778210 (integer approx)
	777926 (int_sqrt_div)
	862121 (Newton approx)

And if a flow is really agressive, count can grow above 10^6

> 
> > By the way, gcc on x86 generates nice "and 0xfffffffe,%eax" instruction
> > for (vars->rec_inv_sqrt << 1).
> 
> Yeah but what do stores of ->rec_inv_sqrt look like?

The load is "shr %edi" as in :
and the store an "or %ecx,%esi"

 5f2:	8b 72 08             	mov    0x8(%rdx),%esi
 5f5:	44 8b 02             	mov    (%rdx),%r8d
 5f8:	89 f7                	mov    %esi,%edi
 5fa:	41 83 c0 01          	add    $0x1,%r8d    vars->count + 1
 5fe:	83 e6 01             	and    $0x1,%esi    vars->dropping in esi

 601:	d1 ef                	shr    %edi
 603:	44 89 02             	mov    %r8d,(%rdx)  vars->count++;
 606:	45 89 c0             	mov    %r8d,%r8d
 609:	89 ff                	mov    %edi,%edi
 60b:	48 89 f9             	mov    %rdi,%rcx
 60e:	48 0f af cf          	imul   %rdi,%rcx

 612:	48 c1 e9 1f          	shr    $0x1f,%rcx
 616:	49 0f af c8          	imul   %r8,%rcx
 61a:	49 b8 00 00 00 80 01 	mov    $0x180000000,%r8
 621:	00 00 00 
 624:	49 29 c8             	sub    %rcx,%r8
 627:	4c 89 c1             	mov    %r8,%rcx
 62a:	48 0f af cf          	imul   %rdi,%rcx
 62e:	48 c1 e9 20          	shr    $0x20,%rcx
 632:	01 c9                	add    %ecx,%ecx
 634:	09 ce                	or     %ecx,%esi        combine the two fields
 636:	89 72 08             	mov    %esi,0x8(%rdx)   final store

 
Using 24bits generates roughly same code. (constants are different)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html