lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AB7D867.4080508@rtr.ca>
Date:	Mon, 21 Sep 2009 15:47:51 -0400
From:	Mark Lord <liml@....ca>
To:	Chris Webb <chris@...chsys.com>
Cc:	Tejun Heo <teheo@...e.de>, linux-scsi@...r.kernel.org,
	Ric Wheeler <rwheeler@...hat.com>,
	Andrei Tanas <andrei@...as.ca>, NeilBrown <neilb@...e.de>,
	linux-kernel@...r.kernel.org,
	IDE/ATA development list <linux-ide@...r.kernel.org>,
	Jeff Garzik <jgarzik@...hat.com>, Mark Lord <mlord@...ox.com>
Subject: Re: MD/RAID time out writing superblock

Chris Webb wrote:
> Chris Webb <chris@...chsys.com> writes:
> 
>> Mark Lord <liml@....ca> writes:
>>
>>> Speaking of which..
>>>
>>> Chris:  I wonder if the errors will also vanish in your situation
>>> by disabling the onboard write-caches in the drives ?
>>>
>>> Eg.  hdparm -W0 /dev/sd?
>> Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
>> I check this one out on it too.
> 
> Our test machine is still being built, but we had an opportunity to try this on
> a couple of the live machines when their RAID arrays failed over the weekend.
> We still got timeouts, but (predictably!) they're not on flushes any more:
> 
>   ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>   ata2.00: cmd 35/00:08:98:c6:00/00:00:4e:00:00/e0 tag 0 dm
...
> all the way through the night.
> 
> I also have these in the log, but they are immediately after turning off the
> write caching in all drives, so may be a red herring with data still being
> written out.
> 
>   ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>   ata2.00: cmd c8/00:08:00:20:80/00:00:00:00:00/e0 tag 0 dm
...
> On another machine, I saw this with write caching turned off:
> 
>   ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
> ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
...

0x35 is a 48-bit DMA WRITE, 0xc8 is a 28-bit DMA READ,
and 0x61 is an NCQ WRITE.

Looks like some kind of hardware trouble to me.
And as Tejun suggested, it's difficult to guess at
a cause other than the PSU.

Cheers, and good luck.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ