[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <467B03C1.50809@tmr.com>
Date: Thu, 21 Jun 2007 19:03:29 -0400
From: Bill Davidsen <davidsen@....com>
To: Bill Davidsen <davidsen@....com>
CC: Neil Brown <neilb@...e.de>, david@...g.hm,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org
Subject: Re: limits on raid
I didn't get a comment on my suggestion for a quick and dirty fix for
-assume-clean issues...
Bill Davidsen wrote:
> Neil Brown wrote:
>> On Thursday June 14, david@...g.hm wrote:
>>
>>> it's now churning away 'rebuilding' the brand new array.
>>>
>>> a few questions/thoughts.
>>>
>>> why does it need to do a rebuild when makeing a new array? couldn't
>>> it just zero all the drives instead? (or better still just record
>>> most of the space as 'unused' and initialize it as it starts useing
>>> it?)
>>>
>>
>> Yes, it could zero all the drives first. But that would take the same
>> length of time (unless p/q generation was very very slow), and you
>> wouldn't be able to start writing data until it had finished.
>> You can "dd" /dev/zero onto all drives and then create the array with
>> --assume-clean if you want to. You could even write a shell script to
>> do it for you.
>>
>> Yes, you could record which space is used vs unused, but I really
>> don't think the complexity is worth it.
>>
>>
> How about a simple solution which would get an array on line and still
> be safe? All it would take is a flag which forced reconstruct writes
> for RAID-5. You could set it with an option, or automatically if
> someone puts --assume-clean with --create, leave it in the superblock
> until the first "repair" runs to completion. And for repair you could
> make some assumptions about bad parity not being caused by error but
> just unwritten.
>
> Thought 2: I think the unwritten bit is easier than you think, you
> only need it on parity blocks for RAID5, not on data blocks. When a
> write is done, if the bit is set do a reconstruct, write the parity
> block, and clear the bit. Keeping a bit per data block is madness, and
> appears to be unnecessary as well.
>>> while I consider zfs to be ~80% hype, one advantage it could have
>>> (but I don't know if it has) is that since the filesystem an raid
>>> are integrated into one layer they can optimize the case where files
>>> are being written onto unallocated space and instead of reading
>>> blocks from disk to calculate the parity they could just put zeros
>>> in the unallocated space, potentially speeding up the system by
>>> reducing the amount of disk I/O.
>>>
>>
>> Certainly. But the raid doesn't need to be tightly integrated
>> into the filesystem to achieve this. The filesystem need only know
>> the geometry of the RAID and when it comes to write, it tries to write
>> full stripes at a time. If that means writing some extra blocks full
>> of zeros, it can try to do that. This would require a little bit
>> better communication between filesystem and raid, but not much. If
>> anyone has a filesystem that they want to be able to talk to raid
>> better, they need only ask...
>>
>>
>>> is there any way that linux would be able to do this sort of thing?
>>> or is it impossible due to the layering preventing the nessasary
>>> knowledge from being in the right place?
>>>
>>
>> Linux can do anything we want it to. Interfaces can be changed. All
>> it takes is a fairly well defined requirement, and the will to make it
>> happen (and some technical expertise, and lots of time .... and
>> coffee?).
>>
> Well, I gave you two thoughts, one which would be slow until a repair
> but sounds easy to do, and one which is slightly harder but works
> better and minimizes performance impact.
>
--
bill davidsen <davidsen@....com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists