linux-kernel - Re: Possibly SATA related freeze killed networking and RAID

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <53905.88.217.68.129.1197321951.squirrel@www.e18.physik.tu-muenchen.de>
Date:	Mon, 10 Dec 2007 22:25:51 +0100 (CET)
From:	"Thiemo Nagel" <thiemo.nagel@...tum.de>
To:	linux-kernel@...r.kernel.org
Cc:	noah123@...il.com
Subject: Re: Possibly SATA related freeze killed networking and RAID

Hello,

I think, I'm experiencing the same problem:

09:16:34 : NETDEV WATCHDOG: eth0: transmit timed out
09:16:34 : eth0: Got tx_timeout. irq: 00000000
09:16:34 : eth0: Ring at 37e50000
09:16:34 : eth0: Dumping tx registers
09:16:34 :   0: 00000000 000000ff 00000003 025003ca 00000000 00000000
00000000 00000000
09:16:34 :  20: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000

[...]

09:16:54 : ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
09:16:54 : ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
09:16:54 : ata6.00: cmd 25/00:08:1e:97:48/00:00:19:00:00/e0 tag 0 cdb 0x0
data 4096 in
09:16:54 :          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
09:16:54 : ata5.00: cmd 25/00:70:1e:97:48/00:00:19:00:00/e0 tag 0 cdb 0x0
data 57344 in
09:16:54 :          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
09:16:54 : ata6: soft resetting port
09:16:54 : ata5: soft resetting port
09:16:54 : ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
09:16:54 : ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
09:16:54 : NETDEV WATCHDOG: eth0: transmit timed out
09:16:54 : eth0: Got tx_timeout. irq: 00000032
09:16:54 : eth0: Ring at 37e50000
09:16:54 : eth0: Dumping tx registers

A more complete log can be found at:
http://www.e18.physik.tu-muenchen.de/~tnagel/misc/kernel-crash.log

The setup is strikingly similar to that of noah (I'm quoting all of this
by heart, if somebody is interested in more detail, just ask.):

Kernel: 2.6.22 (amd64, Debian patches, tainted)
Mainboard: Asus M2N-SLI Deluxe (nForce 570 SLI MCP --> MCP55, same as noah)
CPU: Athlon64 Dual-Core (same as noah)
RAM: 1GB
HD: 22 x Samsung HD501LJ 500GB (same as noah), 1-6 connected to chipset,
7-22 connected to RocketRaid 2340.

I'm using software RAID like noah, (levels 1, 5 and 6), and like with noah
the problem occurred during RAID check, in my case during heavy NFS load
which had been ongoing for ~4 days.  This is the third time, it has
happened, but only this time I could catch the logs via netconsole.  The
two affected drives are connected to the chipset and show no SMART errors.

Unfortunately, the kernel is tainted since I'm using HighPoint's drivers
for the RR2340.  I don't know whether I can change this easily.

Kind regards,

Thiemo Nagel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/