linux-kernel - Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45CC3C9C.3050101@gmail.com>
Date:	Fri, 09 Feb 2007 04:19:24 -0500
From:	Tejun Heo <htejun@...il.com>
To:	Emmeran Seehuber <rototor@...otor.de>
CC:	linux-kernel@...r.kernel.org
Subject: Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))

Hi,

Emmeran Seehuber wrote:
> we`ve got a database server machine running a 2.6.18.2 vanilla kernel on 
> Debian Etch. The database is MySQL 5. Everything works fine, but sometimes 
> the server "lags", i.e. it doesn`t respond for 30 seconds. We`ve now 
> investigated the problem and found this messages in syslog (and dmesg):
> 
> 15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
> 15:55:44 omega11 kernel: ata1: soft resetting port
> 15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
> 15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
> 300)
> 15:55:44 omega11 kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C
> 15:55:44 omega11 last message repeated 5 times
> 15:55:44 omega11 kernel: ata1.00: qc timeout (cmd 0xec)
> 15:55:44 omega11 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> 15:55:44 omega11 kernel: ata1: failed to recover some devices, retrying in 5 
> secs
> 15:55:44 omega11 kernel: ata1: hard resetting port
> 15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
> 300)
> 15:55:44 omega11 kernel: ata1.00: configured for UDMA/133
> 15:55:44 omega11 kernel: ata1: EH complete
> 15:55:44 omega11 kernel: SCSI device sda: 293046768 512-byte hdwr sectors 
> (150040 MB)
> 15:55:44 omega11 kernel: sda: Write Protect is off
> 15:55:44 omega11 kernel: SCSI device sda: drive cache: write back

This is just the recovery part.  Need more log.  If possible, please 
give a shot at 2.6.20.  It might have fixed your problem or at least 
allow better diagnosis.

> We`ve got this messages up to 5 times a day since as far as our syslogs reach. 
> 
> It seems no kind of queuing is used:
> # cat /sys/block/sda/device/queue_type
> none
> # cat /sys/block/sda/device/queue_depth
> 1
> 
> The server is up for 91 days now and has low to medium load (depending on 
> daytime). Since it`s a production server located in a datacenter, we can`t 
> just test some random kernel on it :(

I see.

> Does somebody have a glue whats going on here? Could it be a hardware failure? 

It might be.  Quite some SATA bug reports turn out to be hardware 
problem, most commonly PSU issues.

> We have an identical machine using the same kernel. It`s used as a webserver. 
> There also this messages shows up, but not that often (10 times in 91 days 
> uptime). If it is a hardware failure, then both machines would been affected 
> by the same hardware problem.

Hmmm...

> What can we do to fix this problem? Is it known? 
> 
> I`ve found many posts related to SATA problems, but none seemed to be about 
> this problem.
> 
> Do you need additional information?

Yeah, please post the content of /var/log/boot.msg if available and the 
result of dmesg and lspci -nn.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/