linux-ext4 - Re: [PATCH] ext4: don't load the block bitmap for block groups which have no space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 13 Aug 2012 17:20:34 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	Theodore Ts'o <tytso@....edu>
Cc:	Eric Sandeen <sandeen@...hat.com>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] ext4: don't load the block bitmap for block groups which have no space

On 2012-08-13, at 12:49 PM, Theodore Ts'o wrote:
> On Mon, Aug 13, 2012 at 11:02:08AM -0500, Eric Sandeen wrote:
>> 
>> Looks ok to me; I think this just further optimizes what was done
>> in
>> 
>> 8a57d9d61a6e361c7bb159dda797672c1df1a691
>> ext4: check for a good block group before loading buddy pages
>> 
>> correct?
> 
> Yes, that's right; it's a further optimization.
> 
> I can think of an additional optimization where if we are reading the
> block bitmap for block group N, and the block bitmap for block group
> N+1 hasn't been read before (so we don't have buddy bitmap stats), and
> the block bitmap for bg N+1 is adjacent for bg N, we should read both
> at the same time.  (And this could be generalized for N+2, N+3, etc.)

I was thinking the same thing.  Seems a shame that we have contiguous
bitmaps with flex_bg and don't load them all at once.  However, I ended
up deciding not to pursue the issue, because I suspect the block device
will already be doing some physical block/track readahead.  I guess it
couldn't hurt to submit explicit readahead requests, so long as we don't
wait for anything but the first bitmap to actually be loaded.

> I'm not entirely sure whether it's worth the effort, but I suspect for
> very full file systems, it might be very well be.  This is a more
> general case of the problem where most people only benchmark mostly
> empty file systems, and my experience has been that above 70-80%
> utilization, our performance starts to fall off.  And while disk space
> is cheap, it's not _that_ cheap, and there are always customers who
> insist on using file systems up to a utilization of 99%, and expect
> the same performance as when the file system was freshly formated.  :-(

In my experience, there are so many factors that affect the performance
of a full filesystem that nothing can be done about it.

We've discussed changing statfs() reporting for Lustre to exclude the
"reserved" amount from the device size, so that people don't complain
"why can't I use the last 5% of the device" and/or "tune2fs -m 0" to
remove the reserved space, then complain when performance permanently
dives after hitting 100% full due to bad fragmentation of the last 5%
of files written that will not be deleted for many months.  Even with
SSDs, the fragmentation is going to be seen, due to erase block
fragmentation and more IO submission overhead for small chunks.

The other significant factor is the inner/outer track performance can
vary by a factor of 2x on some drives.  The ext4 allocator biases toward
outer tracks, which is good, but performance is down on the inner tracks
regardless of whether there is fragmentation or not.

Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html