lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140806095012.GN30282@n2100.arm.linux.org.uk>
Date:	Wed, 6 Aug 2014 10:50:12 +0100
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Mattis Lorentzon <Mattis.Lorentzon@...oliv.com>
Cc:	Fredrik Noring <fredrik.noring@...oliv.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: Oops: 17 SMP ARM (v3.16-rc2)

On Tue, Aug 05, 2014 at 01:31:29PM +0000, Mattis Lorentzon wrote:
> We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
> currently running some stability tests.
> 
> During our first test round we triggered a timeout which caused the fec driver
> to become unresponsive for several minutes. The attached backtrace was
> shown when the hardware was rebooted.

What is on the other end of the link?

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
> NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
...
> fec 2188000.ethernet eth0: TX ring dump
> Nr     SC     addr       len  SKB
>   0    0x1c00 0x00000000   66   (null)
...
>  83    0x1c00 0x00000000   66   (null)
>  84  H 0x1c00 0x00000000   66   (null)
>  85    0x9c00 0x2e205000   66 9e384f00
>  86    0x1c00 0x2e204800   66 9e384d80
>  87    0x1c00 0x2e204000   66 9e384180
...
> 376    0x1c00 0x2e252800   66 81cf6180
> 377    0x1c00 0x2e253000   66 81cf6240
> 378 S  0x1c00 0x00000000   66   (null)

So, the software would insert the next packet into slot 378.  However,
the slots from 85 to 377 have not been reaped, despite those in 86 to
377 allegedly having been sent.  This is because the entry in slot 85
shows that it has yet to be sent.

I've no idea what causes this; it looks like there's something screwed
with the hardware which causes the transmitter to skip an entry in the
ring under certain circumstances.  As I've never been able to reproduce
it here, I've not been able to investigate it.

What I would like to do is to stamp each packet in some way with an
identifier marking its ring position, and then monitor the network to
find out whether the packet at slot 85 was actually transmitted - that's
made slightly harder because packets may be dropped at the receiver
when operating in promisc mode.  This would then allow us to work out
some likely causes.

Note that after the transmit watchdog, the interface should recover and
start operating normally again - and that should not take "several
minutes."

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ