linux-ext4 - Re: What to put for unknown stripe-width?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <FED48113-16A4-4EBB-9A63-CABD3A759976@dilger.ca>
Date:	Tue, 20 Sep 2011 17:29:53 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	torn5 <torn5@...ftmail.org>
Cc:	Theodore Tso <tytso@....EDU>, linux-ext4@...r.kernel.org
Subject: Re: What to put for unknown stripe-width?

On 2011-09-20, at 9:29 AM, torn5 wrote:
> On 09/20/11 14:47, Theodore Tso wrote:
>> But that's OK, because I don't know of any RAID array that supports this kind of radical surgery in parameters in the first case. :-)
> 
> Ted, thanks for your reply,
> 
> Linux MD raid supports this, it's called reshape. Most parameters changes are supported, in particular the addition of a new disk and restriping of a raid5 is supported *live*. It's not very stable though...
> 
> But apart from the MD live reshape/restripe, what I could do more likely is to move such filesystem *live* across various RAIDs I have, leveraging LVM's "pvmove". Such RAIDs are almost all of 1MB stride, but with various number of elements, hence they have a different stripe-width.

Just FYI, we use 1MB stripe width for Lustre by default, and this is large
enough for very good IO performance, and is very well tested.

>> The other thing to consider is small writes.   If you are doing small writes, a large stripe size is a disaster, because a 32k random write by a program like MySQL will turn into a 3MB read + 3MB write request.
> 
> So, regarding my original problem, the way you use stride-size in ext4 is that you begin every new file at the start of a stripe?

With ext4 and flex_bg it is nearly irrelevant.  I try to use a flex_bg size
that is -G 256 to match the stripe_width, so that the bitmap load on all the
disks is even.  It would be good to have a patch to do this by default, but
I haven't gotten around to that.

> For growing an existing file what do you do, do you continue to write it from where it was, without holes, or you put a hole, select a new location at the start of a new stripe and start from there?

Large files will be allocated at the start of the stripe_width (1MB) alignment,
and (IIRC) small files will be packed together into a 1MB chunk, to minimize
read-modify-write on the RAID.

> Regarding multiple very small files wrote together by pdflush, what do they do? They are sticked together on the same stripe without holes, or each one goes to a different stripe?

I'm not 100% sure that the small files are still being handled correctly,
since it is a long time since I looked at that code and it has undergone
a lot of changes.

> Is the change of stripe-width with tune2fs supported on a live, mounted fs? (I mean maybe with a mount -o remount but no umount)

I think it is cached in the in-memory superblock to include any mount-time
parameters and sanity checks, so I don't think it will take effect.

Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html