linux-kernel - Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17967.15531.450627.972572@gargle.gargle.HOWL>
Date:	Wed, 25 Apr 2007 15:34:03 +0400
From:	Nikita Danilov <nikita@...sterfs.com>
To:	David Lang <david.lang@...italinsight.com>
Cc:	Amit Gud <gud@....ksu.edu>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, val_henson@...ux.intel.com,
	riel@...riel.com, zab@...bo.net, arjan@...radead.org,
	suparna@...ibm.com, brandon@...p.org, karunasagark@...il.com,
	gud@....edu
Subject: Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

David Lang writes:
 > On Tue, 24 Apr 2007, Nikita Danilov wrote:
 > 
 > > David Lang writes:
 > > > On Tue, 24 Apr 2007, Nikita Danilov wrote:
 > > >
 > > > > Amit Gud writes:
 > > > >
 > > > > Hello,
 > > > >
 > > > > >
 > > > > > This is an initial implementation of ChunkFS technique, briefly discussed
 > > > > > at: http://lwn.net/Articles/190222 and
 > > > > > http://cis.ksu.edu/~gud/docs/chunkfs-hotdep-val-arjan-gud-zach.pdf
 > > > >
 > > > > I have a couple of questions about chunkfs repair process.
 > > > >
 > > > > First, as I understand it, each continuation inode is a sparse file,
 > > > > mapping some subset of logical file blocks into block numbers. Then it
 > > > > seems, that during "final phase" fsck has to check that these partial
 > > > > mappings are consistent, for example, that no two different continuation
 > > > > inodes for a given file contain a block number for the same offset. This
 > > > > check requires scan of all chunks (rather than of only "active during
 > > > > crash"), which seems to return us back to the scalability problem
 > > > > chunkfs tries to address.
 > > >
 > > > not quite.
 > > >
 > > > this checking is a O(n^2) or worse problem, and it can eat a lot of memory in
 > > > the process. with chunkfs you divide the problem by a large constant (100 or
 > > > more) for the checks of individual chunks. after those are done then the final
 > > > pass checking the cross-chunk links doesn't have to keep track of everything, it
 > > > only needs to check those links and what they point to
 > >
 > > Maybe I failed to describe the problem presicely.
 > >
 > > Suppose that all chunks have been checked. After that, for every inode
 > > I0 having continuations I1, I2, ... In, one has to check that every
 > > logical block is presented in at most one of these inodes. For this one
 > > has to read I0, with all its indirect (double-indirect, triple-indirect)
 > > blocks, then read I1 with all its indirect blocks, etc. And to repeat
 > > this for every inode with continuations.
 > >
 > > In the worst case (every inode has a continuation in every chunk) this
 > > obviously is as bad as un-chunked fsck. But even in the average case,
 > > total amount of io necessary for this operation is proportional to the
 > > _total_ file system size, rather than to the chunk size.
 > 
 > actually, it should be proportional to the number of continuation nodes. The 
 > expectation (and design) is that they are rare.

Indeed, but total size of meta-data pertaining to all continuation
inodes is still proportional to the total file system size, and so is
fsck time: O(total_file_system_size).

What is more important, design puts (as far as I can see) no upper limit
on the number of continuation inodes, and hence, even if _average_ fsck
time is greatly reduced, occasionally it can take more time than ext2 of
the same size. This is clearly unacceptable in many situations (HA,
etc.).

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/