[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4E78B15E.9060702@shiftmail.org>
Date: Tue, 20 Sep 2011 17:29:34 +0200
From: torn5 <torn5@...ftmail.org>
To: Theodore Tso <tytso@....EDU>
Cc: linux-ext4@...r.kernel.org
Subject: Re: What to put for unknown stripe-width?
On 09/20/11 14:47, Theodore Tso wrote:
> But that's OK, because I don't know of any RAID array that supports
> this kind of radical surgery in parameters in the first case. :-)
Ted, thanks for your reply,
Linux MD raid supports this, it's called reshape. Most parameters
changes are supported, in particular the addition of a new disk and
restriping of a raid5 is supported *live*. It's not very stable though...
But apart from the MD live reshape/restripe, what I could do more likely
is to move such filesystem *live* across various RAIDs I have,
leveraging LVM's "pvmove". Such RAIDs are almost all of 1MB stride, but
with various number of elements, hence they have a different stripe-width.
> The other thing to consider is small writes. If you are doing small writes, a large stripe size is a disaster, because a 32k random write by a program like MySQL will turn into a 3MB read + 3MB write request.
No this is not correct, for MD at least.
MD uses strips to compute parity, which are always 4k wide for each
device. The reads in your example would be 32k read from two devices,
followed by 32k write to two devices. I am testing this now with iostat
to confirm what I'm saying with a dd 4k write: I see various spurious
read and writes (probably due to MD and LVM accounting, dirty flags etc)
which sum up to about 108k read and 18k write (that's the aggregated sum
from all drives) for a single 4k write to the MD device. That's
definitely not as large as even a single chunk which is 1MB.
What chunksize does is to regulate every how much data the placement of
parity is changed (i.e. your ascii-art picture was correct). Large
chunksize like I use, means that reads smaller than 1MB hopefully come
from 1 spindle only. This is useful for us.
So, regarding my original problem, the way you use stride-size in ext4
is that you begin every new file at the start of a stripe?
For growing an existing file what do you do, do you continue to write it
from where it was, without holes, or you put a hole, select a new
location at the start of a new stripe and start from there?
Regarding multiple very small files wrote together by pdflush, what do
they do? They are sticked together on the same stripe without holes, or
each one goes to a different stripe?
Is the change of stripe-width with tune2fs supported on a live, mounted
fs? (I mean maybe with a mount -o remount but no umount)
Thanks for your help,
T.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists