[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AAE524C.2030401@rtr.ca>
Date: Mon, 14 Sep 2009 10:25:16 -0400
From: Mark Lord <liml@....ca>
To: Tejun Heo <teheo@...e.de>
Cc: Chris Webb <chris@...chsys.com>, linux-scsi@...r.kernel.org,
Ric Wheeler <rwheeler@...hat.com>,
Andrei Tanas <andrei@...as.ca>, NeilBrown <neilb@...e.de>,
linux-kernel@...r.kernel.org,
IDE/ATA development list <linux-ide@...r.kernel.org>,
Jeff Garzik <jgarzik@...hat.com>, Mark Lord <mlord@...ox.com>
Subject: Re: MD/RAID time out writing superblock
Tejun Heo wrote:
> Mark Lord wrote:
>> Tejun Heo wrote:
>> ..
>>> Oooh, another possibility is the above continuous IDENTIFY tries.
>>> Doing things like that generally isn't a good idea because vendors
>>> don't expect IDENTIFY to be mixed regularly with normal IOs and
>>> firmwares aren't tested against that. Even smart commands sometimes
>>> cause problems. So, finding out the thing which is obsessed with the
>>> identity of the drive and stopping it might help.
>> ..
>>
>> Bullpucky. That sort of thing, specifically with IDENTIFY,
>> has never been an issue.
>
> With SMART it has. I wouldn't be too surprised if some new firmware
> chokes on repeated IDENTIFY mixed with stream of NCQ commands. It's
> just not something people (including vendors) do regularly.
..
Yeah, some drives really don't like SMART commands (hddtemp & smartctl).
That's a strange one, too. Because the whole idea of SMART
is that it gets used to periodically monitor drive health.
IDENTIFY is much safer -- usually no media access after initial spin-up,
and lots of things exercise it quite regularly.
Pretty much any hdparm command triggers an IDENTIFY beforehand now,
hddtemp and smartctl both use it too.
I suspect we're missing some info from this specific failure.
Looking back at Chris's earlier posting, the whole thing started
with a FLUSH_CACHE_EXT failure. Once that happens, all bets are
off on anything that follows.
> Everything will be running fine when suddenly:
>
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> res 40/00:00:80:17:91/00:00:37:00:00/40 Emask 0x4 (timeout)
> ata1.00: status: { DRDY }
> ata1: hard resetting link
> ata1: softreset failed (device not ready)
> ata1: hard resetting link
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: configured for UDMA/133
> ata1: EH complete
> end_request: I/O error, dev sda, sector 1465147272
> md: super_written gets error=-5, uptodate=0
> raid10: Disk failure on sda3, disabling device.
> raid10: Operation continuing on 5 devices.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists