[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1303251420480.23176@localhost>
Date: Mon, 25 Mar 2013 14:26:54 +0100 (CET)
From: Lukáš Czerner <lczerner@...hat.com>
To: "Theodore Ts'o" <tytso@....edu>
cc: Lukáš Czerner <lczerner@...hat.com>,
linux-ext4@...r.kernel.org, gharm@...gle.com
Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
On Mon, 25 Mar 2013, Theodore Ts'o wrote:
> Date: Mon, 25 Mar 2013 08:53:09 -0400
> From: Theodore Ts'o <tytso@....edu>
> To: Lukáš Czerner <lczerner@...hat.com>
> Cc: linux-ext4@...r.kernel.org, gharm@...gle.com
> Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
>
> On Mon, Mar 25, 2013 at 11:09:35AM +0100, Lukáš Czerner wrote:
> >
> > Sorry for being dense, but I am trying to understand why this is so
> > bad and what is the "expected" column there.
> >
> > The physical offset of each extent bellow starts on the start of the
> > block group and it seems to me that it's perfectly aligned for every
> > power of two up to the block group size.
>
> Yes, but the logical offset isn't aligned. Consider the simplest
> workload, which is where we are writing the 1GB file sequentially.
> Let's assume that the raid stripe size is 8M. So ideally, we would
> want each write to be a multiple of 8M, starting at logical block 0.
>
> But look what happens here:
>
> > > File size of 1 is 1073741824 (262144 blocks of 4096 bytes)
> > > ext: logical_offset: physical_offset: length: expected: flags:
> > > 0: 0.. 32766: 458752.. 491518: 32767: unwritten
> > > 1: 32767.. 65533: 491520.. 524286: 32767: 491519: unwritten
> > > 2: 65534.. 98300: 589824.. 622590: 32767: 524287: unwritten
>
> If we do 8M writes, then we would want to write in chunks of 2048
> blocks. So consider what happens when we write the 2048 block chunk
> starting with logical block 30720. The fact that there is a
> discontinuity between logical blocks 32766 and 32767 means that we
> will have to do a read-modify-write cycle for that particular RAID
> stripe.
>
> Does that make more sense?
Oh, now I get it :) Thanks a lot for explanation I kept thinking
about the physical layout and forgot that the logical is actually
misaligned.
>
> Another reason why keeping the file as physically contiguous as
> possible is because we can now extent caching using the extent status
> tree. So if we can allocate the file using 2 physically contiguous
> extents in instead of 9 or 10 physically contiguous extents, it means
> the extent status tree uses less memory, too. For a 1GB file, that
> might not make that much difference, but if we caching 2048 of these
> 1G files (on a 2TB disk, for example), keeping the files as physically
> contiguous as possible means we can cache the logical to physical
> block mapping of all of these files much more easily.
Yes, that makes sense too.
>
> Regards,
>
> - Ted
>
Thanks!
-Lukas
Powered by blists - more mailing lists