lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AADF471.2020801@suse.de>
Date:	Mon, 14 Sep 2009 16:44:49 +0900
From:	Tejun Heo <teheo@...e.de>
To:	Chris Webb <chris@...chsys.com>
Cc:	linux-scsi@...r.kernel.org, Ric Wheeler <rwheeler@...hat.com>,
	Andrei Tanas <andrei@...as.ca>, NeilBrown <neilb@...e.de>,
	linux-kernel@...r.kernel.org,
	IDE/ATA development list <linux-ide@...r.kernel.org>,
	Jeff Garzik <jgarzik@...hat.com>, Mark Lord <mlord@...ox.com>
Subject: Re: MD/RAID time out writing superblock

Tejun Heo wrote:
>> I wonder what's different about these two timeouts such that one causes an I/O
>> error and the other just causes a retry after reset? Presumably if the latter
>> was also just a retry, everything would be (closer to being) fine.
> 
> Because this error is actually seen by the md layer and FLUSH in
> general can't be retried cleanly.  On retrial, the drive goes on and
> retry the sectors after the point of failure.  I'm not sure whether
> FLUSH is actually failing here or it's a communication glitch.  At any
> rate, if FLUSH is failing or timing out, the only right thing to do is
> to kick it out of the array as keeping after retrying may lead to
> silent data corruption.  Seriously, it's most likely a hardware
> malfunction although I can't tell where the problem is with the given
> data.  Get the hardware fixed.

Oooh, another possibility is the above continuous IDENTIFY tries.
Doing things like that generally isn't a good idea because vendors
don't expect IDENTIFY to be mixed regularly with normal IOs and
firmwares aren't tested against that.  Even smart commands sometimes
cause problems.  So, finding out the thing which is obsessed with the
identity of the drive and stopping it might help.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ