lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 26 Aug 2009 10:46:06 -0400
From:	Andrei Tanas <andrei@...as.ca>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	NeilBrown <neilb@...e.de>, <linux-kernel@...r.kernel.org>
Subject: Re: MD/RAID: what's wrong with sector 1953519935?

On Wed, 26 Aug 2009 06:34:14 -0400, Ric Wheeler <rwheeler@...hat.com>
wrote:
> On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>>> I would suggest that Andrei might try to write and clear the IO
>>>>>          
>>> error
>>>      
>>>>> at that
>>>>> offset. You can use Mark Lord's hdparm to clear a specific sector or
>>>>> just do the
>>>>> math (carefully!) and dd over it. It the write succeeds (without
>>>>> bumping your
>>>>> remapped sectors count) this is a likely match to this problem,
>>>>>          
>>>> I've tried dd multiple times, it always succeeds, and the relocated
>>>>        
>>> sector
>>>      
>>>> count is currently 1 on this drive, even though this particular fault
>>>> happened at least 3 times so far.
>>>>
>>>>        
>>> I would bump that count way up (say to 2) and see if you have an
>>> issue...
>>>      
>> Not sure what you mean by this: how can I artificially bump the
relocated
>> sector count?
>>
>>    
> Sorry - you need to set the tunable:
> 
> /sys/block/mdX/md/safe_mode_delay
> 
> to something like "2" to prevent that sector from being a hotspot...

I did that as soon as you suggested that it's possible to tune it. The
array is still being rebuilt (it's a fairly busy machine, so rebuilding is
slow). I'll monitor it, but I don't expect to see the results soon as even
with the default value of 0.2 it used to happen once in several weeks.

On the other note: is it possible that the drive was actually working
properly but was not given enough time to complete the write request? These
newer drives have 32MB cache but the same rotational speed and seek times
as the older ones so they must need more time to flush their cache?

Andrei.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ