lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091124111946.GA7883@ff.dom.local>
Date:	Tue, 24 Nov 2009 11:19:46 +0000
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Caleb Cushing <xenoterracide@...il.com>
Cc:	Frans Pop <elendil@...net.nl>, Andi Kleen <andi@...stfloor.org>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	e1000-devel@...ts.sourceforge.net
Subject: Re: large packet loss take2 2.6.31.x

On Tue, Nov 24, 2009 at 01:17:09AM -0500, Caleb Cushing wrote:
> > Btw, currently I don't consider this dropping means there has to be
> > a bug. It could be otherwise - a feature... e.g. when a new kernel
> > can transmit faster (then dropping in some other, slower place can
> > happen).
> 
> um... where would it be dropping that we wouldn't have a bug? I mean
> sure faster is great... but if it makes my network not work right...

E.g. if it were dropped because of a queue overflow (but it doesn't
seem to be the case, at least at your box) or because of memory
problems while handling a lot of traffic.

> 
> I've added all (I think) information you've asked for to the bug
> http://bugzilla.kernel.org/show_bug.cgi?id=13835 except for ethtool
> and netstat on the router side. ethtool complains about not having
> driver or capability (maybe because it's a 2.4 kernel?) and the
> version of netstat doesn't support -s. I disabled everything that I
> can think of that would send/receive packets before doing the test
> client side, except dhcp/dns windows box's were probably sending some
> broadcasts too. but the traffic should be pretty low. I did remember
> to set the txqueuelen didn't seem to make a difference

Alas it's not all information I asked. E.g. "netstat -s before faulty
kernel" and "netstat -s after faulty kernel" seem to be the same file:
netstat_after.slave4.log.gz. Anyway, since there are problems with
getting stats from the router we still can't compare them, or check
for the dropped stats. (Btw, could you check for /proc/net/softnet_stat
yet?)

So, it might be the kernel problem you reported, but there is not
enough data to prove it. Then my proposal is to try to repeat this
problem in more "testing friendly" conditions - preferably against
some other, more up-to-date linux box, if possible?

> only error in dmesg I see is
> 
> e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb

I added e1000e maintainers to CC to have a look at this warning.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ