lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 20 Sep 2009 12:46:27 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	Tejun Heo <tj@...nel.org>
CC:	Chris Webb <chris@...chsys.com>, Neil Brown <neilb@...e.de>,
	Ric Wheeler <rwheeler@...hat.com>,
	Andrei Tanas <andrei@...as.ca>, linux-kernel@...r.kernel.org,
	IDE/ATA development list <linux-ide@...r.kernel.org>,
	linux-scsi@...r.kernel.org, Jeff Garzik <jgarzik@...hat.com>,
	Mark Lord <mlord@...ox.com>
Subject: Re: MD/RAID time out writing superblock

On 09/17/2009 09:44 AM, Tejun Heo wrote:
>> Thanks Neil. This implies that when we see these fifteen second
>> hangs reading /proc/mdstat without write errors, there are genuinely
>> successful superblock writes which are taking fifteen seconds to
>> complete, presumably corresponding to flushes which complete but
>> take a full 15s to do so.
>>
>> Would such very slow (but ultimately successful) flushes be
>> consistent with the theory of power supply issues affecting the
>> drives? It feels like the 30s timeouts on flush could be just a more
>> severe version of the 15s very slow flushes.
>
> Probably not.  Power problems usually don't resolve themselves with
> longer timeout.  If the drive genuinely takes longer than 30s to
> flush, it would be very interesting tho.  That's something people have
> been worrying about but hasn't materialized yet.  The timeout is
> controlled by SD_TIMEOUT in drivers/scsi/sd.h.  You might want to bump
> it up to, say, 60s and see whether anything changes.

It's possible if the power dip only slightly disrupted the drive it 
might just take longer to complete the write. I've also seen reports of 
vibration issues causing problems in RAID arrays (there's a video on 
Youtube of a guy yelling at a Sun disk array during heavy I/O and the 
resulting vibrations causing an immediate spike in I/O service times). 
Could be something like that causing issues with simultaneous media 
access to all drives in the array, too..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ