lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <469D2D911E4BF043BFC8AD32E8E30F5B24AEBB@wdscexbe07.sc.wdc.com>
Date:	Thu, 24 Jun 2010 15:06:03 -0700
From:	"Daniel Taylor" <Daniel.Taylor@....com>
To:	"Mike Fedyk" <mfedyk@...efedyk.com>
Cc:	"Daniel J Blueman" <daniel.blueman@...il.com>,
	"Mat" <jackdachef@...il.com>,
	"LKML" <linux-kernel@...r.kernel.org>,
	<linux-fsdevel@...r.kernel.org>,
	"Chris Mason" <chris.mason@...cle.com>,
	"Ric Wheeler" <rwheeler@...hat.com>,
	"Andrew Morton" <akpm@...ux-foundation.org>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"The development of BTRFS" <linux-btrfs@...r.kernel.org>
Subject: RE: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)

 

> -----Original Message-----
> From: mikefedyk@...il.com [mailto:mikefedyk@...il.com] On 
> Behalf Of Mike Fedyk
> Sent: Wednesday, June 23, 2010 9:51 PM
> To: Daniel Taylor
> Cc: Daniel J Blueman; Mat; LKML; 
> linux-fsdevel@...r.kernel.org; Chris Mason; Ric Wheeler; 
> Andrew Morton; Linus Torvalds; The development of BTRFS
> Subject: Re: Btrfs: broken file system design (was Unbound(?) 
> internal fragmentation in Btrfs)
> 
> On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor 
> <Daniel.Taylor@....com> wrote:
> > Just an FYI reminder.  The original test (2K files) is utterly
> > pathological for disk drives with 4K physical sectors, such as
> > those now shipping from WD, Seagate, and others.  Some of the
> > SSDs have larger (16K0 or smaller blocks (2K).  There is also
> > the issue of btrfs over RAID (which I know is not entirely
> > sensible, but which will happen).
> >
> > The absolute minimum allocation size for data should be the same
> > as, and aligned with, the underlying disk block size.  If that
> > results in underutilization, I think that's a good thing for
> > performance, compared to read-modify-write cycles to update
> > partial disk blocks.
> 
> Block size = 4k
> 
> Btrfs packs smaller objects into the blocks in certain cases.
> 

As long as no object smaller than the disk block size is ever
flushed to media, and all flushed objects are aligned to the disk
blocks, there should be no real performance hit from that.

Otherwise we end up with the damage for the ext[234] family, where
the file blocks can be aligned, but the 1K inode updates cause
the read-modify-write (RMW) cycles and and cost >10% performance
hit for creation/update of large numbers of files.

An RMW cycle costs at least a full rotation (11 msec on a 5400 RPM
drive), which is painful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ