[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A0F40696-AABF-4D2D-AA37-3E4D4BC8EBBE@dilger.ca>
Date: Fri, 7 Mar 2014 14:09:10 -0700
From: Andreas Dilger <adilger@...ger.ca>
To: Theodore Ts'o <tytso@....edu>,
Alexey Zhuravlev <alexey.zhuravlev@...el.com>
Cc: Lukáš Czerner <lczerner@...hat.com>,
Maurizio Lombardi <mlombard@...hat.com>,
ext4 development <linux-ext4@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: fix bug in ext4_mb_normalize_request()
On Mar 6, 2014, at 11:32 AM, Theodore Ts'o <tytso@....edu> wrote:
> On Thu, Mar 06, 2014 at 06:54:05PM +0100, Lukáš Czerner wrote:
>>
>> All that said, I was getting to rewrite this mess a long time ago,
>> it's just a reminder that it's something that needs to be done.
>> Especially since the bigger requests are getting split unnecessarily
>> which hurts especially in fallocate case.
>
> We should try to get input from Andreas about what some of the more
> interesting hueristics in mballoc were trying to accomplish, since
> there's a lot going on that's not obvious, and one of the reasons why
> I've always been worried about trying to do cleanups was because
> something that looks ugly might be papering over some other dark
> corner of mballoc.c ---- and so I was fairly certain that one we
> started opening up mballoc.c, we'd have to do a lot of work on it, and
> a lot of performance measurements to make sure we didn't accidentally
> introduce some performance regression.
There is actually quite a lengthy description of mballoc at the start
of the file. I guess it would make sense to turn anything in this
thread into comments for ext4_mb_normalize_request() once verified.
So, below is hopefully a summary of what ext4_mb_normalize_request()
is actually doing. I've CC'd Alex to correct my mistakes. I think
the first few cases are commented accurately and self explanatorily:
* don't prealloc blocks for non-regular files (!EXT4_MB_HINT_DATA)
- should we reconsider this for larger directories?
* don't use prealloc if caller wants exact (EXT4_MB_HINT_GOAL_ONLY)
- currently unused, but would be useful for defrag
* don't reserve blocks if caller doesn't want it (EXT4_MB_HINT_NOPREALLOC)
- used for small files or if requested data fits exactly into extent
* if write is a small file, use group prealloc (EXT4_MB_HINT_GROUP_ALLOC)
- this combines multiple small writes into a single prealloc region
and avoids read-modify-write of RAID stripes
The rest of the function is about handling large file writes efficiently.
* round up small writes to a power-of-two value for better alignment
- we have a patch that makes the preallocation region sizes tunable,
if that is something of interest. That said, we don't really use it.
* if the request is large, align it to a power-of-two boundary
- the allocation goal is based on the logical file offset, so that if
a file is written sparsely by multiple threads, it can coalesce into
a densely packed file in the end. This is common for HPC jobs, or
applications like bittorrent.
* the list_for_each() loops align the prealloc region with other regions
- this helps when the file becomes fully allocated that the regions
will be contiguous on disk
I'm pretty sure some of this is not 100% accurate, hopefully Alex can
comment and correct any inconsistencies.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists