lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Fri, 22 Mar 2013 15:19:39 -0700
From:	Greg Harmon <gharm@...gle.com>
To:	Lukáš Czerner <lczerner@...hat.com>
Cc:	Dmitry Monakhov <dmonakhov@...nvz.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	"Theodore Ts'o" <tytso@....edu>, Curt Wohlgemuth <curtw@...gle.com>
Subject: Re: [PATCH] ext4: Do not normalize request from fallocate

On Fri, Mar 22, 2013 at 1:03 PM, Lukáš Czerner <lczerner@...hat.com> wrote:
> On Fri, 22 Mar 2013, Greg Harmon wrote:
>
>> Date: Fri, 22 Mar 2013 10:10:46 -0700
>> From: Greg Harmon <gharm@...gle.com>
>> To: Dmitry Monakhov <dmonakhov@...nvz.org>
>> Cc: Lukas Czerner <lczerner@...hat.com>,
>>     "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
>>     Theodore Ts'o <tytso@....edu>, Curt Wohlgemuth <curtw@...gle.com>
>> Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
>>
>> [Resending without html.]
>> I haven't looked at ext4 in awhile now. CC'ing Ted and Curt in case
>> they want to chime in. See comment below as well.
>>
>> On Thu, Mar 21, 2013 at 9:03 AM, Dmitry Monakhov <dmonakhov@...nvz.org> wrote:
>> > On Thu, 21 Mar 2013 16:50:45 +0100, Lukas Czerner <lczerner@...hat.com> wrote:
>> >> Block requests from fallocate has been normalized originally. Then it was
>> >> changed by 556b27abf73833923d5cd4be80006292e1b31662 not to normalize it.
>> >> And then it was changed by 3c6fe77017bc6ce489f231c35fed3220b6691836
>> >> again to normalize the request.
>> >>
>> >> The fact is that we _never_ want to normalize the request from
>> >> fallocate. We know exactly how much space we're going to use and we do
>> >> not want anyone to mess with the request and there is no point in doing
>> >> so.
>> > Looks reasonable.
>> > Reviewed-by:Dmitry Monakhov <dmonakhov@...nvz.org>
>> >>
>> >> Commit 3c6fe77017bc6ce489f231c35fed3220b6691836 mentioned that
>>
>> Is this a publicly available commit? I did not find it with a quick
>> google search.
>> Anyway, I don't see a problem with this commit off-hand, but I'll
>> defer to Ted or Curt.
>
> Yes it is publicly available (git show
> 3c6fe77017bc6ce489f231c35fed3220b6691836). I cc'ed you because you
> are the author of that commit and this one if effectively
> reverting it, so I was curious whether you have anything to say
> about it in case I've missed something.
>
> Thanks!

Not sure why I was having trouble finding those commits earlier;
(ec22ba8edb507395c95fbc617eea26a6b2d98797 is in git but not indexed on
google). Thanks for the notice.

-Greg

> -Lukas
>
>>
>> -Greg Harmon
>>
>> >> large fallocate requests were not physically contiguous. However it is
>> >> important to see why that is the case. Because the request is so big the
>> >> allocator will try to find free group to allocate from skipping block
>> >> groups which are used, which is fine. However it will only allocate
>> >> extents of 2^15-1 block (limitation of uninitialized extent size)
>> >> which will leave one block in each block group free which will make the
>> >> extent tree physically non-contiguous, however _only_ by one block which
>> >> is perfectly fine.
>> >>
>> >> This will never happen when we normalize the request because for some
>> >> reason (maybe bug) it will be normalized to much smaller request (2048
>> >> blocks) and those extents will then be merged together not leaving any
>> >> free block in between - hence physically contiguous. However the fact
>> >> that we're splitting huge requests into ton of smaller ones and then
>> >> merging extents together is very _very_ bad for fallocate performance.
>> >>
>> >> The situation is even worst since with commit
>> >> ec22ba8edb507395c95fbc617eea26a6b2d98797 we no longer merge
>> >> uninitialized extents so we end up with absolutely _huge_ extent tree
>> >> for bigger fallocate requests which is also bad for performance but not
>> >> only when fallocate itself, but even when working with the file
>> >> later on.
>> >>
>> >> Fix this by disabling normalization for fallocate. From my simple testing
>> >> with this commit fallocate is much faster on non fragmented file
>> >> system. On my system fallocate 15T is almost 3x faster with this patch
>> >> and removing this file is almost 2x faster - tested on real hardware.
>> >>
>> >> Signed-off-by: Lukas Czerner <lczerner@...hat.com>
>> >> ---
>> >>  fs/ext4/extents.c |   18 ++++++++++--------
>> >>  1 files changed, 10 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> >> index e2bb929..a40a602 100644
>> >> --- a/fs/ext4/extents.c
>> >> +++ b/fs/ext4/extents.c
>> >> @@ -4422,16 +4422,18 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>> >>               trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>> >>               return ret;
>> >>       }
>> >> -     flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
>> >> -     if (mode & FALLOC_FL_KEEP_SIZE)
>> >> -             flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
>> >> +
>> >>       /*
>> >> -      * Don't normalize the request if it can fit in one extent so
>> >> -      * that it doesn't get unnecessarily split into multiple
>> >> -      * extents.
>> >> +      * We do NOT want the requests from fallocate to be normalized
>> >> +      * ever!. We know exactly how much we want to allocate and
>> >> +      * we do not need to do any mumbo-jumbo with it. Requests bigger
>> >> +      * than uninit extent size, will be divided automatically into
>> >> +      * biggest possible extents.
>> >>        */
>> >> -     if (len <= EXT_UNINIT_MAX_LEN << blkbits)
>> >> -             flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
>> >> +     flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT |
>> >> +             EXT4_GET_BLOCKS_NO_NORMALIZE;
>> >> +     if (mode & FALLOC_FL_KEEP_SIZE)
>> >> +             flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
>> >>
>> >>  retry:
>> >>       while (ret >= 0 && ret < max_blocks) {
>> >> --
>> >> 1.7.7.6
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> >> the body of a message to majordomo@...r.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ