linux-kernel - Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <4C260507.4020409@redhat.com>
Date:	Sat, 26 Jun 2010 09:47:51 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Daniel Shiels <btrfs721@...ubar.com>
CC:	Michael Tokarev <mjt@....msk.ru>,
	Daniel Taylor <daniel.taylor@....com>,
	Mike Fedyk <mfedyk@...efedyk.com>,
	Daniel J Blueman <daniel.blueman@...il.com>,
	Mat <jackdachef@...il.com>, LKML <linux-kernel@...r.kernel.org>,
	linux-fsdevel@...r.kernel.org,
	Chris Mason <chris.mason@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	The development of BTRFS <linux-btrfs@...r.kernel.org>
Subject: Re: Btrfs: broken file system design (was Unbound(?) internal   
   fragmentation in Btrfs)

On 06/26/2010 08:34 AM, Daniel Shiels wrote:
>> 25.06.2010 22:58, Ric Wheeler wrote:
>>      
>>> On 06/24/2010 06:06 PM, Daniel Taylor wrote:
>>>        
>> []
>>      
>>>>> On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor
>>>>> <Daniel.Taylor@....com>   wrote:
>>>>>
>>>>>            
>>>>>> Just an FYI reminder.  The original test (2K files) is utterly
>>>>>> pathological for disk drives with 4K physical sectors, such as
>>>>>> those now shipping from WD, Seagate, and others.  Some of the
>>>>>> SSDs have larger (16K0 or smaller blocks (2K).  There is also
>>>>>> the issue of btrfs over RAID (which I know is not entirely
>>>>>> sensible, but which will happen).
>>>>>>              
>> Why it is not sensible to use btrfs on raid devices?
>> Nowadays raid is just everywhere, from 'fakeraid' on AHCI to
>> large external arrays on iSCSI-attached storage.  Sometimes
>> it is nearly imposisble to _not_ use RAID, -- many servers
>> comes with a built-in RAID card which can't be turned off or
>> disabled.  And hardware raid is faster (at least in theory)
>> at least because it puts less load on various system busses.
>>
>> To many "enterprise folks" a statement "we don't need hw raid,
>> we have better solution" sounds like "we're just a toy, don't
>> use".
>>
>> Hmm?  ;)
>>
>> /mjt, who always used and preferred _software_ raid due to
>>   multiple reasons, and never used btrfs so far.
>>      
> Its not that you shouldn't use it on raid it's just it looses some value
> from the file system.
>
> Two nice features that btrfs provides are checksums and mirroring. If a
> disk corrupts a block then btrfs will realize due to the strong checksum
> and use the mirrored block. If you are using a raid system the raid won't
> know the data is corrupted and raid doesn't provide a way for the file
> system to get to the redundant block.
>
> I read a paper from Sun a while back about the undetected read failure
> rates for modern disks having not changed for many years. Disks are so
> large now that undetected failures are unacceptably likely for many
> systems. Hence zfs doing similar in file system raid schemes.
>
> In my lab I used dd to clobber data in some of my mirrors. Btrfs logs lots
> of checksum errors but never corrupted a file. Doing the same on a classic
> raid with classic filesystem (solaris with veritas volume manager)
> silently gave me bad data depending on what disk it felt like reading
> from.
>
> Daniel.
>    

I was (one of many) people who worked at EMC on designing storage 
arrays. If you are using any high end, external hardware array, it will 
detect data corruption pro-actively for you. Most arrays do continual 
scans for latent errors and have internal data integrity checks that are 
used for this.

Note that DIF/DIX adds an extra 8 bytes of data integrity to newer 
standards disks. We don't do anything with that today in btrfs, but you 
could imagine ways to get even better data integrity protection.

If you are using software RAID (MD), you should also use its internal 
checks to do this kind of proactive detection of latent errors on a 
regular basis (say once every week or two).

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/