[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080602121007.1e81cf05.billfink@mindspring.com>
Date: Mon, 2 Jun 2008 12:10:07 -0400
From: Bill Fink <billfink@...dspring.com>
To: Glen Turner <gdt@....id.au>
Cc: Alan Cox <alan@...rguk.ukuu.org.uk>,
James Cammarata <jimi@...x.net>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH] net: add ability to clear stats via ethtool -
e1000/pcnet32
On Mon, 02 Jun 2008, Glen Turner wrote:
>
> > Yes, every individual Linux network administrator can re-create the
> > wheel by devising their own scripts, but it makes much more sense
> > to me to implement a simple general kernel mechanism once that could
> > be used generically, than to have hundreds (or thousands) of Linux
> > network administrators each having to do it themselves (perhaps
> > multiple times if they have a variety of types of systems and types
> > of NICs).
>
> Hi Bill,
>
> If you pull the stats using a SNMP polling tool (torrus, cacti, mrtg)
> then those package's graphs give nice "did this get better or worse"
> output for debugging network issues.
I do use mrtg for network monitoring to determine when things go
bad, but when they do go bad, then I typically need to get much
more detailed info when troubleshooting the problem.
> I'd suggest you use one of those tools rather than writing your
> own scripts. Even if 99% of the time the graphs record zero errors,
> knowing when those errors started is very valuable and well worth
> the additional effort of configuring the tools over a command-line
> or a kernel hack.
First of all, when assisting a user, they typically aren't even
running an snmp daemon (and there might be firewall issues to
access it if they are). And I don't think the "ethtool -S" driver
stats are even accessible via SNMP (although they may contribute
to more generic interface stats which are), and it is the specific
driver stats which are often key to help diagnosing the problem.
> The more sophisticated tools can do alerting to Nagios should
> a variable suddenly change its behaviour.
Definitely useful for certain arenas.
> The Cisco/Juniper/everyone-else feature to run console stats
> separately from SNMP stats is nice, but it's rather tuned to
> the needs of router-heads and tends to fall apart when multiple
> staff are debugging a fault.
I use it all the time in coordination with network peers and
joint troubleshooting. They clear the interface stats, and they
and I can then view the interface stats as a test is run (they
give me RO access to view the stats), or vice versa depending
on whose network is being examined.
> If we do proceed with better command line stats then the number
> of errored seconds and the worst errored second and its value
> would be useful. These useful numbers can't be calculated by
> the SNMP polling tools and it's hard to see how they could be
> done in user-space.
I'm all for any improved debugging/diagnostic capabilities, including
the extremely useful ability to clear/snapshot driver stats (there
could also be an option to un-snapshot if you wanted to get back to
seeing the absolute counter values).
-Bill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists