linux-kernel - Re: ata error EH in SWNCQ mode, with nVidia MCP55 sata controller and SAMSUNG HD103UJ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Tue, 05 Jan 2010 18:50:36 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	Marco Bisetto <marco.bisetto@...il.com>
CC:	linux-kernel@...r.kernel.org, ide <linux-ide@...r.kernel.org>
Subject: Re: ata error EH in SWNCQ mode, with nVidia MCP55 sata controller
 and    SAMSUNG HD103UJ

On 01/05/2010 11:56 AM, Marco Bisetto wrote:
> Hi,
>
> A problem with a "IDE interface: nVidia Corporation MCP55 SATA Controller
> (rev a3)" and two SAMSUNG HD103UJ sata hard disk drives. Disabling write
> cache on a disk gives error:
>
> kernel: [   45.584445] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
> kernel: [   45.584445] ata1: SWNCQ:qc_active 0x1 defer_bits 0x0
> last_issue_tag 0x0
> kernel: [   45.584445]   dhfis 0x1 dmafis 0x1 sdbfis 0x0
> kernel: [   45.584445] ata1: ATA_REG 0x40 ERR_REG 0x0
> kernel: [   45.584445] ata1: tag : dhfis dmafis sdbfis sacitve
> kernel: [   45.584445] ata1: tag 0x0: 1 1 0 1
> kernel: [   45.584445] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0
> action 0x6 frozen
> kernel: [   45.584445] ata1.00: cmd
> 61/08:00:3f:f4:e8/00:00:02:00:00/40 tag 0 ncq 4096 out
> kernel: [   45.584445]          res
> 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
> kernel: [   45.584445] ata1.00: status: { DRDY }
> kernel: [   45.584445] ata1: hard resetting link
>
> The error appears four times for each disk at startup, only when a disk has
> write cache disabled. For example, disabling write cache in two disks:
>
>   45.595788] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
>   45.595800] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
>   76.491877] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
>   76.511331] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
> 107.391075] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
> 107.423701] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
> 138.287465] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
> 138.332093] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
>
> Disabling write cache on disk attached to ata4 and enabling it on disk
> attached to ata1:
>
>   45.583489] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
>   76.479940] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
> 107.375643] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
> 138.272023] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
>
> Enabling write cache on both disks = no errors.
>
> I don't think the problem can be associated with bad cables or power
> supply, as it happens in each channel, it is the same for each disk and
> happens at the same time.
>
> Anybody has ideas on what can it be and if there is a solution?

 From what I can see, that debug output from sata_nv means that the 
drive hasn't reported it's completed the command (no SDB FIS) after the 
timeout (usually 30 seconds). That's an awfully long time. It could be 
that those drives have issues with NCQ and disabled write cache where 
some of the commands in the queue can be starved for overly long periods..

CCing linux-ide.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/