linux-kernel - Re: PROBLEM: sata timeouts with intel 82801HB on amd64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070207094105.7ab1c601@localhost>
Date:	Wed, 7 Feb 2007 09:41:05 +0100
From:	Paolo Ornati <ornati@...twebnet.it>
To:	"Trevor Offner Caira" <toc3@...nell.edu>
Cc:	linux-kernel@...r.kernel.org, Tejun Heo <htejun@...il.com>
Subject: Re: PROBLEM: sata timeouts with intel 82801HB on amd64

On Mon, 5 Feb 2007 21:08:33 -0500 (EST)
"Trevor Offner Caira" <toc3@...nell.edu> wrote:

> (1) One-line summary: I'm getting SATA timeouts with Intel 82801HB on amd64.
> 
> (2) Full description: Unless CONFIG_RCU_TORTURE_TEST is set, I get sata
> timeouts of this form periodically:
> 
> ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
> ata1.00: cmd 60/18:00:b3:22:0a/00:00:00:00:00/40 tag 0 cdb 0x0 data 12288 in
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata1.00: configured for UDMA/133
> ata1: EH complete
> SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
> sda: Write Protect is off
> SCSI device sda: write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> 
> This entails complete blocking of all disk i/o (I only have one disk) for
> about 45 seconds. The kernel then negotiates the next lowest transfer
> speed (UDMA/166 all the way down to PIO0, when it errors saying it cannot
> go slower). I get this issue on amd64 kernels only. The issue is only
> present in 2.6.18+, since earlier kernels do not support my chipset at all
> (intel 82801HB).
> 
> Knoppix 5.1.1 does not show this issue (i.e., no disk i/o issues even
> without rcutorture running). However, a native amd64 build of exactly the
> same kernel config shows the issue.
> 
> (3) Keywords: SATA, AHCI, modules, kernel, Intel.
> 

[CUT]

> (8.7) Other information: There's nothing in the system except for the
> DG965WH motherboard, E6600 processor, 1GB of kingston RAM, the ST3320620AS
> hard drive and 430 W PSU.
> 
> Thanks for reading this far! :)


Are you using XFS, right?

Can you see if the problem goes away either:

1) disabling NCQ ("echo 1 > /sys/block/sda/device/queue_depth" in a
boot script)

	OR

2) mounting XFS filesystem(s) with "nobarrier" option

	?


I've seen this problem with very similar hardware (and so I've added
Tejun to CC :).


If mounting XFS with "nobarrier" fixes the problem it seems that more
than one Seagate disk cannot handle the Cache Flush command while other
commands are in fly...

-- 
	Paolo Ornati
	Linux 2.6.20 on x86_64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/