[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C24FC71.6020001@redhat.com>
Date: Fri, 25 Jun 2010 14:58:57 -0400
From: Ric Wheeler <rwheeler@...hat.com>
To: Daniel Taylor <Daniel.Taylor@....com>
CC: Mike Fedyk <mfedyk@...efedyk.com>,
Daniel J Blueman <daniel.blueman@...il.com>,
Mat <jackdachef@...il.com>, LKML <linux-kernel@...r.kernel.org>,
linux-fsdevel@...r.kernel.org,
Chris Mason <chris.mason@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
The development of BTRFS <linux-btrfs@...r.kernel.org>
Subject: Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation
in Btrfs)
On 06/24/2010 06:06 PM, Daniel Taylor wrote:
>
>
>
>> -----Original Message-----
>> From: mikefedyk@...il.com [mailto:mikefedyk@...il.com] On
>> Behalf Of Mike Fedyk
>> Sent: Wednesday, June 23, 2010 9:51 PM
>> To: Daniel Taylor
>> Cc: Daniel J Blueman; Mat; LKML;
>> linux-fsdevel@...r.kernel.org; Chris Mason; Ric Wheeler;
>> Andrew Morton; Linus Torvalds; The development of BTRFS
>> Subject: Re: Btrfs: broken file system design (was Unbound(?)
>> internal fragmentation in Btrfs)
>>
>> On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor
>> <Daniel.Taylor@....com> wrote:
>>
>>> Just an FYI reminder. The original test (2K files) is utterly
>>> pathological for disk drives with 4K physical sectors, such as
>>> those now shipping from WD, Seagate, and others. Some of the
>>> SSDs have larger (16K0 or smaller blocks (2K). There is also
>>> the issue of btrfs over RAID (which I know is not entirely
>>> sensible, but which will happen).
>>>
>>> The absolute minimum allocation size for data should be the same
>>> as, and aligned with, the underlying disk block size. If that
>>> results in underutilization, I think that's a good thing for
>>> performance, compared to read-modify-write cycles to update
>>> partial disk blocks.
>>>
>> Block size = 4k
>>
>> Btrfs packs smaller objects into the blocks in certain cases.
>>
>>
> As long as no object smaller than the disk block size is ever
> flushed to media, and all flushed objects are aligned to the disk
> blocks, there should be no real performance hit from that.
>
> Otherwise we end up with the damage for the ext[234] family, where
> the file blocks can be aligned, but the 1K inode updates cause
> the read-modify-write (RMW) cycles and and cost>10% performance
> hit for creation/update of large numbers of files.
>
> An RMW cycle costs at least a full rotation (11 msec on a 5400 RPM
> drive), which is painful.
>
Also interesting is to note that you can get a significant overheard
even with 0 byte length files. Path names, metadata overhead, etc can
consume (depending on the pathname length) quite a bit of space per file.
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists