lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 7 Mar 2014 14:09:10 -0700
From:	Andreas Dilger <adilger@...ger.ca>
To:	Theodore Ts'o <tytso@....edu>,
	Alexey Zhuravlev <alexey.zhuravlev@...el.com>
Cc:	Lukáš Czerner <lczerner@...hat.com>,
	Maurizio Lombardi <mlombard@...hat.com>,
	ext4 development <linux-ext4@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: fix bug in ext4_mb_normalize_request()

On Mar 6, 2014, at 11:32 AM, Theodore Ts'o <tytso@....edu> wrote:
> On Thu, Mar 06, 2014 at 06:54:05PM +0100, Lukáš Czerner wrote:
>> 
>> All that said, I was getting to rewrite this mess a long time ago,
>> it's just a reminder that it's something that needs to be done.
>> Especially since the bigger requests are getting split unnecessarily
>> which hurts especially in fallocate case.
> 
> We should try to get input from Andreas about what some of the more
> interesting hueristics in mballoc were trying to accomplish, since
> there's a lot going on that's not obvious, and one of the reasons why
> I've always been worried about trying to do cleanups was because
> something that looks ugly might be papering over some other dark
> corner of mballoc.c ---- and so I was fairly certain that one we
> started opening up mballoc.c, we'd have to do a lot of work on it, and
> a lot of performance measurements to make sure we didn't accidentally
> introduce some performance regression.

There is actually quite a lengthy description of mballoc at the start
of the file.  I guess it would make sense to turn anything in this
thread into comments for ext4_mb_normalize_request() once verified.

So, below is hopefully a summary of what ext4_mb_normalize_request()
is actually doing.  I've CC'd Alex to correct my mistakes.  I think
the first few cases are commented accurately and self explanatorily:

* don't prealloc blocks for non-regular files (!EXT4_MB_HINT_DATA)
  - should we reconsider this for larger directories?
* don't use prealloc if caller wants exact (EXT4_MB_HINT_GOAL_ONLY)
  - currently unused, but would be useful for defrag
* don't reserve blocks if caller doesn't want it (EXT4_MB_HINT_NOPREALLOC)
  - used for small files or if requested data fits exactly into extent
* if write is a small file, use group prealloc (EXT4_MB_HINT_GROUP_ALLOC)
  - this combines multiple small writes into a single prealloc region
    and avoids read-modify-write of RAID stripes

The rest of the function is about handling large file writes efficiently.
* round up small writes to a power-of-two value for better alignment
  - we have a patch that makes the preallocation region sizes tunable,
    if that is something of interest.  That said, we don't really use it.
* if the request is large, align it to a power-of-two boundary
  - the allocation goal is based on the logical file offset, so that if
    a file is written sparsely by multiple threads, it can coalesce into
    a densely packed file in the end.  This is common for HPC jobs, or
    applications like bittorrent.
* the list_for_each() loops align the prealloc region with other regions
  - this helps when the file becomes fully allocated that the regions
    will be contiguous on disk

I'm pretty sure some of this is not 100% accurate, hopefully Alex can
comment and correct any inconsistencies.

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ