lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140806125550.GO30282@n2100.arm.linux.org.uk>
Date:	Wed, 6 Aug 2014 13:55:50 +0100
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Mattis Lorentzon <Mattis.Lorentzon@...oliv.com>
Cc:	Fredrik Noring <fredrik.noring@...oliv.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: Oops: 17 SMP ARM (v3.16-rc2)

On Wed, Aug 06, 2014 at 11:10:06AM +0000, Mattis Lorentzon wrote:
> Russell,
> 
> > What is on the other end of the link?
> 
> 16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20
> machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05).
> 
> There may be multiple problems. The backtrace has only been seen a few
> times, on two different cards. Most of the time, the network for a random
> card just stalls without any visible backtrace or error messages. The other
> cards seem to be unaffected when this happens.

Can you ascertain whether these stalls are a result of some failure of the
receive side or the transmit side - you should be able to tell that if you
watch the packet counts via ifconfig on the stalled card.  Also, it would
be useful to know whether the FEC interrupt was firing.

I hope you have some kind of serial console on these cards?

> > What I would like to do is to stamp each packet in some way with an
> > identifier marking its ring position, and then monitor the network to find out
> > whether the packet at slot 85 was actually transmitted - that's made slightly
> > harder because packets may be dropped at the receiver when operating in
> > promisc mode.  This would then allow us to work out some likely causes.
> 
> We would be glad to run this test on our setup, do you have more detailed
> information on how to set it up?

One of the problems is to find some way to stamp each packet with a 10-bit
number without having any side effects.  I guess one possibility would be
to overwrite the source MAC address on transmit, which hopefully should not
cause any side effects.

> After a network stall, we usually have to powercycle the ARM hardware to
> get it back to a usable state. These stalls last at least several minutes,
> perhaps indefinitely. It does not seem to recover properly, and is no longer
> reachable via the network.

Hmm.  Okay, I think the first thing we need to do is to work out why
the silent stalls are happening.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ