[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A11F67B.3050805@cosmosbay.com>
Date: Tue, 19 May 2009 01:59:55 +0200
From: Eric Dumazet <dada1@...mosbay.com>
To: David Miller <davem@...emloft.net>
CC: jarkao2@...il.com, vexwek@...il.com, netdev@...r.kernel.org,
kaber@...sh.net, devik@....cz
Subject: [PATCH] pkt_sched: gen_estimator: use 64 bits intermediate counters
for bps
David Miller a écrit :
> From: Jarek Poplawski <jarkao2@...il.com>
> Date: Mon, 18 May 2009 19:23:49 +0200
>
>> On Mon, May 18, 2009 at 06:40:56PM +0200, Eric Dumazet wrote:
>>> With a typical estimator "1sec 8sec", ewma_log value is 3
>>>
>>> At gigabit speeds, we are very close to overflow yes, since
>>> we only have 27 bits available, so 134217728 bytes per second
>>> or 1073741824 bits per second.
>>>
>>> So formula :
>>> e->avbps += ((long)rate - (long)e->avbps) >> e->ewma_log;
>>> is going to overflow.
>>>
>>> One way to avoid the overflow would be to use a smaller estimator, like "500ms 4sec"
>>>
>>> Or use a 64bits rate & avbps, this is needed fo 10Gb speeds I suppose...
>> Yes, I considered this too, but because of an overhead I decided to
>> fix as designed (according to the comment) for now. But probably you
>> are right, and we should go further, so I'm OK with your patch.
>
> I like this patch too, Eric can you submit this formally with
> proper signoffs etc.?
>
Sure, here it is. We might need a similar patch to get a correct pps value
too, since we currently are limited to ~ 2^21 packets per second.
[PATCH] pkt_sched: gen_estimator: use 64 bit intermediate counters for bps
gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit (2^32 bytes)
Using 64 bit intermediate avbps/brate counters can allow us to reach this
theorical limit.
Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
Signed-off-by: Jarek Poplawski <jarkao2@...il.com>
---
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 9cc9f95..ea28659 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -66,9 +66,9 @@
NOTES.
- * The stored value for avbps is scaled by 2^5, so that maximal
- rate is ~1Gbit, avpps is scaled by 2^10.
-
+ * avbps is scaled by 2^5, avpps is scaled by 2^10.
+ * both values are reported as 32 bit unsigned values. bps can
+ overflow for fast links : max speed being 34360Mbit/sec
* Minimal interval is HZ/4=250msec (it is the greatest common divisor
for HZ=100 and HZ=1024 8)), maximal interval
is (HZ*2^EST_MAX_INTERVAL)/4 = 8sec. Shorter intervals
@@ -86,9 +86,9 @@ struct gen_estimator
spinlock_t *stats_lock;
int ewma_log;
u64 last_bytes;
+ u64 avbps;
u32 last_packets;
u32 avpps;
- u32 avbps;
struct rcu_head e_rcu;
struct rb_node node;
};
@@ -115,6 +115,7 @@ static void est_timer(unsigned long arg)
rcu_read_lock();
list_for_each_entry_rcu(e, &elist[idx].list, list) {
u64 nbytes;
+ u64 brate;
u32 npackets;
u32 rate;
@@ -125,9 +126,9 @@ static void est_timer(unsigned long arg)
nbytes = e->bstats->bytes;
npackets = e->bstats->packets;
- rate = (nbytes - e->last_bytes)<<(7 - idx);
+ brate = (nbytes - e->last_bytes)<<(7 - idx);
e->last_bytes = nbytes;
- e->avbps += ((long)rate - (long)e->avbps) >> e->ewma_log;
+ e->avbps += ((s64)(brate - e->avbps)) >> e->ewma_log;
e->rate_est->bps = (e->avbps+0xF)>>5;
rate = (npackets - e->last_packets)<<(12 - idx);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists