[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <478B9E32.4020902@cosmosbay.com>
Date: Mon, 14 Jan 2008 18:38:58 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Mark Seger <Mark.Seger@...com>
Cc: netdev@...r.kernel.org
Subject: Re: occasionally corrupted network stats in /proc/net/dev
Mark Seger a écrit :
> I had posted the following on linux-net and haven't see any responses
> possibly because nobody had any or that list is obsolete. I have been
> told this is the current list for everything networking on linux so I
> thought I'd try again...
>
> I suspect the answer will be that it is what it is, but here's the
> deal. I have a tool I use for monitoring network traffic among other
> things - see http://collectl.sourceforge.net/ - and one of its
> benefits is that you can run it continuously as a daemon (similar to
> sar) and generate data in a format suitable for plotting. This means
> that you can automate your entire network monitoring infrastructure at
> fairly fine granularity, down to second if you like. Actually
> 1-second level monitoring will provide incorrect data on earlier
> kernels because the stats aren't updated on 1 second boundaries and
> you need to monitor at an interval of 0.9765 seconds, but that's a
> different story which is explained at
> http://collectl.sourceforge.net/NetworkStats.html
>
> But more importantly, I've found that occasionally (not that often)
> there is bogus data reported from /proc/net/dev. While I don't have a
> lot of details on this it seems to only show up in 64 bit kernels.
> Look at the following samples taken at 1 second intervals:
>
> eth0:135115809 1024897 0 0 0 0 0 9
> 135458926 910340 0 0 0 0 0 0
> eth0:135118023 1024923 0 0 0 0 0 9
> 135460952 910363 0 0 0 0 0 0
> eth0: 0 884620 0 0 0 0 0 909397
> 9687563 1049736 0 0 0 0 0 0
> eth0:135121189 1024957 0 0 0 0 0 9
> 135464222 910400 0 0 0 0 0 0
> eth0:135129565 1024995 0 0 0 0 0 9
> 135473687 910435 0 0 0 0 0 0
>
> see the middle sample? When I look at the change between samples it
> generates a really big number since the difference is assumed to be
> caused a counter wrapping. The problem is it's not always
> straightforward when there is bad data. For example if the original
> and bogus values are close enough it's not even clear there is a problem.
>
> So the obvious question is, is there any way to prevent the bogus data
> from getting reported? If not, is there any way to set the values to
> something to indicate that the correct values can't be determined?
> Clearly this problem would be visible to any tool that looks at /proc
> but since many tools are not automated or don't take it to the level I
> do, nobody probably notices. As for the counter update frequency,
> even though they now appear to be updated closer to a 1 second
> boundary it also means tools that can monitor at sub-second intervals
> will report incorrect data since the counters only change once a second.
What is the NIC used for eth0 (and driver name)
Which version of linux kernel do you run ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists