linux-kernel - 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 7 Feb 2007 18:17:12 +0100
From:	Emmeran Seehuber <rototor@...otor.de>
To:	linux-kernel@...r.kernel.org
Subject: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))

Hi there,

we`ve got a database server machine running a 2.6.18.2 vanilla kernel on 
Debian Etch. The database is MySQL 5. Everything works fine, but sometimes 
the server "lags", i.e. it doesn`t respond for 30 seconds. We`ve now 
investigated the problem and found this messages in syslog (and dmesg):

15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: soft resetting port
15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
300)
15:55:44 omega11 kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C
15:55:44 omega11 last message repeated 5 times
15:55:44 omega11 kernel: ata1.00: qc timeout (cmd 0xec)
15:55:44 omega11 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
15:55:44 omega11 kernel: ata1: failed to recover some devices, retrying in 5 
secs
15:55:44 omega11 kernel: ata1: hard resetting port
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
300)
15:55:44 omega11 kernel: ata1.00: configured for UDMA/133
15:55:44 omega11 kernel: ata1: EH complete
15:55:44 omega11 kernel: SCSI device sda: 293046768 512-byte hdwr sectors 
(150040 MB)
15:55:44 omega11 kernel: sda: Write Protect is off
15:55:44 omega11 kernel: SCSI device sda: drive cache: write back

We`ve got this messages up to 5 times a day since as far as our syslogs reach. 

It seems no kind of queuing is used:
# cat /sys/block/sda/device/queue_type
none
# cat /sys/block/sda/device/queue_depth
1

The server is up for 91 days now and has low to medium load (depending on 
daytime). Since it`s a production server located in a datacenter, we can`t 
just test some random kernel on it :(

Does somebody have a glue whats going on here? Could it be a hardware failure? 
We have an identical machine using the same kernel. It`s used as a webserver. 
There also this messages shows up, but not that often (10 times in 91 days 
uptime). If it is a hardware failure, then both machines would been affected 
by the same hardware problem.

What can we do to fix this problem? Is it known? 

I`ve found many posts related to SATA problems, but none seemed to be about 
this problem.

Do you need additional information?

Thanks

cu,
  Emmy

P.S.: Please CC me, since i`m not subscribed.

View attachment "lspci.txt" of type "text/plain" (1565 bytes)

Download attachment "config.gz" of type "application/x-gzip" (11411 bytes)

View attachment "cpuinfo.txt" of type "text/plain" (3053 bytes)