lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070221192523.GG3219@lazybastard.org>
Date:	Wed, 21 Feb 2007 19:25:23 +0000
From:	Jörn Engel <joern@...ybastard.org>
To:	Juan Piernas Canovas <piernas@...ec.um.es>
Cc:	Sorin Faibish <sfaibish@....com>,
	kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

On Wed, 21 February 2007 19:31:40 +0100, Juan Piernas Canovas wrote:
> 
> I do not understand. Do you mean that if I have 10 segments, 5 busy and 5 
> free, after cleaning I could need 6 segments? How? Where the extra blocks 
> come from?

This is a fairly complicated subject and I have trouble explaining it to
people - even though I hope that maybe one or two dozen understand it by
now.  So let me try to give you an example:

In LogFS, inodes are stored in an inode file.  There are no B-Trees yet,
so the regular unix indirect blocks are used.  My example will be
writing to a directory, so that should only involve metadata by your
definition and be a valid example for DualFS as well.  If it is not,
please tell me where the difference lies.

The directory is large, so appending to it involves writing a datablock
(D0), and indirect block (D1) and a doubly indirect block (D2).

Before:
Segment 1: [some data] [   D1  ] [more data]
Segment 2: [some data] [   D0  ] [more data]
Segment 3: [some data] [   D2  ] [more data]
Segment 4: [             empty             ]
...

After:
Segment 1: [some data] [garbage] [more data]
Segment 2: [some data] [garbage] [more data]
Segment 3: [some data] [garbage] [more data]
Segment 4: [D0][D1][D2][      empty        ]
...

Ok.  After this, the position of D2 on the medium has changed.  So we
need to update the inode and write that as well.  If the inode number
for this directory is high, we will need to write the inode (I0), an
indirect block (I1) and a doubly indirect block (I2).  The picture
becomes a bit more complicates.

Before:
Segment 1: [some data] [   D1  ] [more data]
Segment 2: [some data] [   D0  ] [more data]
Segment 3: [some data] [   D2  ] [more data]
Segment 4: [             empty             ]
Segment 5: [some data] [   I1  ] [more data]
Segment 6: [some data] [   I0  ] [more data]
Segment 7: [some data] [   I2  ] [more data]
...

After:
Segment 1: [some data] [garbage] [more data]
Segment 2: [some data] [garbage] [more data]
Segment 3: [some data] [garbage] [more data]
Segment 4: [D0][D1][D2][I0][I1][I2][ empty ]
Segment 5: [some data] [garbage] [more data]
Segment 6: [some data] [garbage] [more data]
Segment 7: [some data] [garbage] [more data]
...

So what has just happened?  The user did a single "touch foo" in a large
directory and has caused six objects to move.  Unless some of those
objects were in the same segment before, we now have six segments
containing a tiny amount of garbage.

And there is almost no way how you can squeeze that garbage back out.
The cleaner will fundamentally do the same thing as a regular write - it
will move objects.  So if you want to clean a segment containing the
block of a different directory, you may again have to move five
additional objects, the indirect blocks, inode and ifile indirect
blocks.

At this point, your cleaner is becoming a threat.  There is a real
danger that it will create more garbage in unrelated segments than it
frees up.  I claim that you cannot keep 50% clean segments, unless you
move away from the simplistic cleaner I described above.

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ