lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081209054712.GB10270@mit.edu>
Date:	Tue, 9 Dec 2008 00:47:12 -0500
From:	Theodore Tso <tytso@....edu>
To:	Akira Fujita <a-fujita@...jp.nec.com>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: [PATCH]ext4: online defrag: Enable to reuse blocks by multiple
	defrag


On Tue, Dec 09, 2008 at 11:26:37AM +0900, Akira Fujita wrote:
> I'm redesigning ext4 online defrag based on the comments from Ted.
> Probably defrag's block allocation method will be changed greatly.

Akira-san,

FYI, there was a discussion about defrag on today's ext4 call.  One of
the ideas that was kicked around was to completely change the
primitives used by defrag, and to design things around three
primitive, general purpose interfaces.

We didn't go into complete detail on the call, but let me give you a
strawman proposal for consideration/discussion:

(1) An (ioctl-based) interface which allows a privileged program to
specify one or more range of blocks which the filesystem's block
allocator must NOT allocate from.  (We may want to have a flag for
each block range which either makes the block lockout advisory, such
that if the block allocator can't find blocks anywhere else, it may
invade the reserved block area --- or mandatory, where if there are no
other blocks, the filesystem returns ENOSPC).  This allows the
defragmenter to work on an area of the disk without worrying about
concurrent allocations by other processes from getting in the way.

(2) An (ioctl-based) interface which associates with an inode
preferred range(s) of blocks which the block allocator will try using
first; if those blocks are not available, or the block range(s) is
exhausted, the block allocator use its normal algorithms to pick the
best available block.  The set of preferred blocks is only guaranteed
to persist while the inode is in memory.

(3) An (ioctl-based) interface which takes two inode numbers, and
allows a privileged program to "defrag" an inode by using blocks from
a donor inode and using them as the new blocks for the destination
inode, preserving the contents of the destination inode.

The advantage of this implementation strategy is that each of the
interfaces can be implemented one at a time, with very well defined
semantics, and which can be independently tested.  The semantics can
also be used in different combinations to solve alternate problems.
For example, a combination of (1) and (2) can be used to reserve
blocks for use by a directory that is expected to grow, so the
directory can use contiguous blocks.  Or, they could be used to
implement an "online shrink" that would allow a filesystem to be
resized to a smaller size.

One other thing that comes to mind.  If it turns out that these
interfaces have multiple users, and in some cases the reservations or
block allocation restrictions are expected to last for longer than a
process lifetime, it may be useful to tag them with a short (8-16
character) name, so that it is possible to list the current set of
reservations, and so they can be removed by a privileged user.  This
could be overdesigning the interface; but the whole *point* of
thinking about the interfaces from a more generic point of view (as
opposed for use by a specific program for which the kernel interfaces
are custom-designed) is that hopefully they will have multiple use
cases and multiple users, in which case we need to worry about how
multiple users can co-exist.

Thoughts, comments?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ