[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6601abe90904291321u3f13d8b0p88b9a9eba5bc03a1@mail.gmail.com>
Date: Wed, 29 Apr 2009 13:21:09 -0700
From: Curt Wohlgemuth <curtw@...gle.com>
To: Theodore Tso <tytso@....edu>
Cc: Andreas Dilger <adilger@....com>,
ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Question on block group allocation
Hi Ted:
On Wed, Apr 29, 2009 at 12:37 PM, Theodore Tso <tytso@....edu> wrote:
> On Wed, Apr 29, 2009 at 03:16:47PM -0400, Theodore Tso wrote:
>>
>> When you have a chance, can you send out the details from your test run?
>>
>
> Oops, sorry, our two e-mails overlapped. Sorry, I didn't see your new
> e-mail when I sent my ping-o-gram.
>
> On Wed, Apr 29, 2009 at 11:38:49AM -0700, Curt Wohlgemuth wrote:
>>
>> Okay, my phrasing was not as precise as it could have been. What I
>> meant by "total fragmentation" was simply that the range of physical
>> blocks for the 10GB file was much lower with Andreas' patch:
>>
>> Before patch: 8282112 - 103266303
>> After patch: 271360 - 5074943
>>
>> The number of extents is much larger. See the attached debugfs output.
>
> Ah, OK. You didn't attach the "e2fsck -E fragcheck" output, but I'm
> going to guess that the blocks for 10g, 4g, and 4g-2 ended up getting
> interleaved, possibly because they were written in parallel, and not
> one after each other? Each of the extents in the "after" debugfs were
> proximately 2k blocks (8 megabytes) in length, and are separated by a
> largish cnumber of blocks.
Hmm, I thought I attached the output from "e2fsck -E fragcheck"; yes,
I did: one simple line:
/dev/hdm3: clean, 14/45760512 files, 7608255/183010471 blocks
And actually, I created the files sequentially:
dd if=/dev/zero of=$MNT_PT/4g bs=1G count=4
dd if=/dev/zero of=$MNT_PT/4g-2 bs=1G count=4
dd if=/dev/zero of=$MNT_PT/10g bs=1G count=10
> Now, if my theory that the files were written in an interleaved
> fashion is correct, if it is also true that they will be read in an
> interleaved pattern, the layout on disk might actually be the best
> one. If however they are going to be read sequentially, and you
> really want them to be allocated contiguously, then if you know what
> the final size of these files will be, then the probably the best
> thing to do is to use the fallocate system call.
>
> Does that make sense?
Sure, in this sense.
The test in question does something like this:
1. Create 20 or so large files, sequentially.
2. Randomly choose a file.
3. Randomly choose an offset in this file.
4. Read from that file/offset a fixed buffer size (say 256k); the file
was opened with O_DIRECT
5. Go back to #2
6. Stop after some time period
This might not be the most realistic workload we want (the test
actually can be run by doing #1 above with multiple threads), but it's
certainly interesting.
The point that I'm interested in is why the physical block spread is
so different for the 10GB file between (a) the above 'dd' command
sequence; and (b) simply creating the "10g" file alone, without
creating the 4GB files first.
I just did (b) above on a kernel without Andreas' patch, on a freshly
formatted ext4 FS, and here's (most of) the debugfs output for it:
BLOCKS:
(IND):164865, (0-63487):34816-98303, (63488-126975):100352-163839, (126976-19046
3):165888-229375, (190464-253951):231424-294911, (253952-481279):296960-524287,
(481280-544767):821248-884735, (544768-706559):886784-1048575, (706560-1196031):
1607680-2097151, (1196032-1453067):2656256-2913291
TOTAL: 1453069
The total spread of the blocks is tiny compared to the total spread
from the 3 "dd" commands above.
I haven't yet really looked at the block allocation results using
Andreas' patch, except for the "10g" file after the three "dd"
commands above. So I'm not sure what the effects are with, say,
larger numbers of files. I'll be doing some more experimentation
soon.
Thanks,
Curt
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists