lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 7 Aug 2014 11:11:06 +0000
From:	Mattis Lorentzon <Mattis.Lorentzon@...oliv.com>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
CC:	Fredrik Noring <fredrik.noring@...oliv.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: RE: Oops: 17 SMP ARM (v3.16-rc2)

Russell,

> Can you ascertain whether these stalls are a result of some failure of the
> receive side or the transmit side - you should be able to tell that if you watch
> the packet counts via ifconfig on the stalled card.  Also, it would be useful to
> know whether the FEC interrupt was firing.

grep eth /proc/interrupts
151:          0          0          0          0       GIC 151  2188000.ethernet
166:    1205661          0          0          0  gpio-mxc   6  2188000.ethernet

The interrupt counter 166 increases regularly during the stalls.
Ifconfig indicates that the RX and TX  counters do not increase.

> I hope you have some kind of serial console on these cards?

Yes, indeed. Local stimuli seems to be able to unstall the network in a
somewhat random fashion. Running e.g. ifconfig or ping locally may
immediately or after up to about half a minute make the network responsive.
However, it usually degenerates again to a complete stall within seconds.
Without local stimuli the network does not appear to recover at all. The card
does not even respond to pings (again, most often without any apparent
error messages).

Running both of the following commands in parallel from the FC server seems
to trigger the problem within minutes (please note that the arm card stops
responding to both ping and ssh):

# while :; do ssh arm-card echo Ok; done
# ping arm-card

We have noticed the same problem on both the i.MX6 and the Zynq cards
(using KSZ9021 and Cadence GEM drivers). However, the number of
iterations required to trigger the problem vary. Sometimes it might stall after
less than 100, but in other cases the stalls begin after nearly 10000 iterations.
Once stalled (and unstalled after stimuli), the network on that particular card
degenerates a lot more often. Apart from the kernel, IP numbers and MAC
addresses, the software configurations are identical between the Zynq and
the i.MX6. Perhaps the fault is unrelated to the Freescale driver?

> Hmm.  Okay, I think the first thing we need to do is to work out why the
> silent stalls are happening.

Would you have any ideas on what to check next?

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists