[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1336859324.31653.1385.camel@edumazet-glaptop>
Date: Sat, 12 May 2012 23:48:44 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: dave.taht@...ferbloat.net, netdev@...r.kernel.org,
nichols@...lere.com, van@...lere.net, codel@...ts.bufferbloat.net,
ycheng@...gle.com, mattmathis@...gle.com, therbert@...gle.com,
shemminger@...tta.com, nanditad@...gle.com
Subject: Re: [PATCH net-next] codel: use Newton method instead of sqrt()
and divides
On Sat, 2012-05-12 at 16:45 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Sat, 12 May 2012 22:40:56 +0200
>
> > 24 bit of precision for the reciprocal value is more than enough (Van
> > suggested 16 bits in fact), so we have actually room for 7 bits if
> > needed.
>
> Using a u16 would also work for me.
I tried it but it gives noticeable errors for count > 16000, and no
speed gain.
count=16525 scale=0 sqrt(scaled_count)=2056 reciprocal(sq)=1fe0200
Newton=235
interval/sqrt(16525) =
777909 (float compute)
778210 (integer approx)
777926 (int_sqrt_div)
862121 (Newton approx)
And if a flow is really agressive, count can grow above 10^6
>
> > By the way, gcc on x86 generates nice "and 0xfffffffe,%eax" instruction
> > for (vars->rec_inv_sqrt << 1).
>
> Yeah but what do stores of ->rec_inv_sqrt look like?
The load is "shr %edi" as in :
and the store an "or %ecx,%esi"
5f2: 8b 72 08 mov 0x8(%rdx),%esi
5f5: 44 8b 02 mov (%rdx),%r8d
5f8: 89 f7 mov %esi,%edi
5fa: 41 83 c0 01 add $0x1,%r8d vars->count + 1
5fe: 83 e6 01 and $0x1,%esi vars->dropping in esi
601: d1 ef shr %edi
603: 44 89 02 mov %r8d,(%rdx) vars->count++;
606: 45 89 c0 mov %r8d,%r8d
609: 89 ff mov %edi,%edi
60b: 48 89 f9 mov %rdi,%rcx
60e: 48 0f af cf imul %rdi,%rcx
612: 48 c1 e9 1f shr $0x1f,%rcx
616: 49 0f af c8 imul %r8,%rcx
61a: 49 b8 00 00 00 80 01 mov $0x180000000,%r8
621: 00 00 00
624: 49 29 c8 sub %rcx,%r8
627: 4c 89 c1 mov %r8,%rcx
62a: 48 0f af cf imul %rdi,%rcx
62e: 48 c1 e9 20 shr $0x20,%rcx
632: 01 c9 add %ecx,%ecx
634: 09 ce or %ecx,%esi combine the two fields
636: 89 72 08 mov %esi,0x8(%rdx) final store
Using 24bits generates roughly same code. (constants are different)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists