linux-kernel - Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.61.0702250255530.18915@ditec.inf.um.es>
Date:	Sun, 25 Feb 2007 03:41:40 +0100 (CET)
From:	Juan Piernas Canovas <piernas@...ec.um.es>
To:	Jörn Engel <joern@...ybastard.org>
Cc:	Sorin Faibish <sfaibish@....com>,
	kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

Hi Jörn,

On Fri, 23 Feb 2007, [utf-8] Jörn Engel wrote:

> On Thu, 22 February 2007 20:57:12 +0100, Juan Piernas Canovas wrote:
>>
>> I do not agree with this picture, because it does not show that all the
>> indirect blocks which point to a direct block are along with it in the
>> same segment. That figure should look like:
>>
>> Segment 1: [some data] [ DA D1' D2' ] [more data]
>> Segment 2: [some data] [ D0 D1' D2' ] [more data]
>> Segment 3: [some data] [ DB D1  D2  ] [more data]
>>
>> where D0, DA, and DB are datablocks, D1 and D2 indirect blocks which
>> point to the datablocks, and D1' and D2' obsolete copies of those
>> indirect blocks. By using this figure, is is clear that if you need to
>> move D0 to clean the segment 2, you will need only one free segment at
>> most, and not more. You will get:
>>
>> Segment 1: [some data] [ DA D1' D2' ] [more data]
>> Segment 2: [                free                ]
>> Segment 3: [some data] [ DB D1' D2' ] [more data]
>> ......
>> Segment n: [ D0 D1 D2 ] [         empty         ]
>>
>> That is, D0 needs in the new segment the same space that it needs in the
>> previous one.
>>
>> The differences are subtle but important.
>
> Ah, now I see.  Yes, that is deadlock-free.  If you are not accounting
> the bytes of used space but the number of used segments, and you count
> each partially used segment the same as a 100% used segment, there is no
> deadlock.
>
> Some people may consider this to be cheating, however.  It will cause
> more than 50% wasted space.  All obsolete copies are garbage, after all.
> With a maximum tree height of N, you can have up to (N-1) / N of your
> filesystem occupied by garbage.

I do not agree. Fortunately, the greatest part of the files are written at 
once, so what you usually have is:

Segment 1: [                  data                  ]
Segment 2: [some data] [ D0 DA DB D1 D2 ] [more data]
Segment 3: [                  data                  ]
......

On the other hand, the DualFS cleaner tries to clean several segments 
everytime it runs. Therefore, if you have the following case:

Segment 1: [some data] [ DA D1' D2' ] [more data]
Segment 2: [some data] [ D0 D1' D2' ] [more data]
Segment 3: [some data] [ DB D1' D2' ] [more data]
......

after cleaning, you can have this one:

Segment 3: [                  free                  ]
Segment 3: [                  free                  ]
Segment 3: [                  free                  ]
......
Segment i: [D0 DA DB D1 D2 ] [       more data      ]

Moreover, if the cleaner starts running when the free space drops below a 
specific threshold, it is very difficult to waste more than 50% of disk 
space, specially with meta-data (actually, I am unable to imagine that 
situation :).

> Another downside is that with large amounts of garbage between otherwise
> useful data, your disk cache hit rate goes down.  Read performance is
> suffering.  But that may be a fair tradeoff and will only show up in
> large metadata reads in the uncached (per Linux) case.  Seems fair.

Well, our experimental results say another thing. As I have said, the 
greatest part of the files are written at once, so their meta-data blocks 
are together on disk. This allows DualFS to implement an explicit 
prefetching of meta-data blocks which is quite effective, specially when 
there are several processes reading from disk at the same time.

On the other hand, DualFS also implements an on-line meta-data relocation 
mechanism which can help to improve meta-data prefetching, and garbage 
collection.

Obviously, there can be some slow-growing files that can produce some 
garbage, but they do not hurt the overall performance of the file system.

>
> Quite interesting, actually.  The costs of your design are disk space,
> depending on the amount and depth of your metadata, and metadata read
> performance.  Disk space is cheap and metadata reads tend to be slow for
> most filesystems, in comparison to data reads.  You gain faster metadata
> writes and loss of journal overhead.  I like the idea.
>

Yeah :) If you have taken a look to my presentation at LFS07, the disk 
traffic of meta-data blocks is dominated by writes.

> Jörn
>

 	Juan.
-- 
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34968367657    Fax: +34968364151
email: piernas@...ec.um.es
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index

*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o PostScript :-) ***