linux-ext4 - Re: What to put for unknown stripe-width?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <4E78B15E.9060702@shiftmail.org>
Date:	Tue, 20 Sep 2011 17:29:34 +0200
From:	torn5 <torn5@...ftmail.org>
To:	Theodore Tso <tytso@....EDU>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: What to put for unknown stripe-width?

On 09/20/11 14:47, Theodore Tso wrote:
> But that's OK, because I don't know of any RAID array that supports 
> this kind of radical surgery in parameters in the first case. :-)

Ted, thanks for your reply,

Linux MD raid supports this, it's called reshape. Most parameters 
changes are supported, in particular the addition of a new disk and 
restriping of a raid5 is supported *live*. It's not very stable though...

But apart from the MD live reshape/restripe, what I could do more likely 
is to move such filesystem *live* across various RAIDs I have, 
leveraging LVM's "pvmove". Such RAIDs are almost all of 1MB stride, but 
with various number of elements, hence they have a different stripe-width.

> The other thing to consider is small writes.   If you are doing small writes, a large stripe size is a disaster, because a 32k random write by a program like MySQL will turn into a 3MB read + 3MB write request.

No this is not correct, for MD at least.
MD uses strips to compute parity, which are always 4k wide for each 
device. The reads in your example would be 32k read from two devices, 
followed by 32k write to two devices. I am testing this now with iostat 
to confirm what I'm saying with a dd 4k write: I see various spurious 
read and writes (probably due to MD and LVM accounting, dirty flags etc) 
which sum up to about 108k read and 18k write (that's the aggregated sum 
from all drives) for a single 4k write to the MD device. That's 
definitely not as large as even a single chunk which is 1MB.
What chunksize does is to regulate every how much data the placement of 
parity is changed (i.e. your ascii-art picture was correct). Large 
chunksize like I use, means that reads smaller than 1MB hopefully come 
from 1 spindle only. This is useful for us.

So, regarding my original problem, the way you use stride-size in ext4 
is that you begin every new file at the start of a stripe?

For growing an existing file what do you do, do you continue to write it 
from where it was, without holes, or you put a hole, select a new 
location at the start of a new stripe and start from there?

Regarding multiple very small files wrote together by pdflush, what do 
they do? They are sticked together on the same stripe without holes, or 
each one goes to a different stripe?

Is the change of stripe-width with tune2fs supported on a live, mounted 
fs? (I mean maybe with a mount -o remount but no umount)

Thanks for your help,

T.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html