linux-kernel - Re: [PATCH RFC] nilfs2: continuous snapshotting file system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080827181338.GC1371@logfs.org>
Date:	Wed, 27 Aug 2008 20:13:38 +0200
From:	Jörn Engel <joern@...fs.org>
To:	Ryusuke Konishi <konishi.ryusuke@....ntt.co.jp>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] nilfs2: continuous snapshotting file system

On Wed, 27 August 2008 01:54:30 +0900, Ryusuke Konishi wrote:
> 
> Yeah, it was very tough battle :)
> Read is OK.  But write was hard.  I looked at the vfs code over again and
> again.
> We've implemented NILFS without bringing specific changes into vfs.
> However, if we can find common basis for LFSes, I'm grad to cooperate 
> with you.
> Though I don't know whether exporting inode_lock is the case or not ;)

Well, I was looking more for something like a list of problems and
solutions.  Partially because I am plain curious and partially because I
know those are the problem areas of any log-structured filesystem and
they deserve special attention in a review.

In logfs, garbage collection may read (and write) any inode and any
block from any file.  And since garbage collection may be called from
writepage() and write_inode(), the fun included:

P: iget() on the inode being currently written back and locked.
S: Split I_LOCK into I_LOCK and I_SYNC.  Has been merged upstream.

P: iget() on an inode in I_FREEING or I_WILL_FREE state.
S: Add inodes to a list in drop_inode() and remove them again in
   destroy_inode().  iget() in GC context is wrapped in a method that
   checks said list first and return an inode from the list when
   applicable.  Used to hold inode_lock to prevent races, but a
   logfs-local lock is actually sufficient.

If either of the two problems above is solved by calling
ilookup5_nowait() I bet you a fiver that a race with data corruption is
lurking somewhere in the area.

P: find_get_page() or some variant on a page handed to
   logfs_writepage().
S: Use the one available page flag, PG_owner_priv_1 to mark pages that
   are waiting for the single-threaded logfs write path.  If any page GC
   needs is locked, check for PG_owner_priv_1 and if it is set, just use
   the page anyway.  Whoever has set the flag cannot clear it until GC
   has finished.
   If the flag is not set, the page might still be somewhere in the
   logfs write path - before setting the page.  So simply do the check
   in a loop, call schedule() each time, knock on wood and keep your
   fingers crossed that the page will either become unlocked and set
   PG_owner_priv_1 sometime soon.  I'm not proud of this solution but
   know no better one.

So something like the above for nilfs would be useful.  And maybe, just
to be on the safe side, try the following testcase overnight:
- Create tiny filesystem (32M or so).
- Fill filesystem 100% with a single file.
- Rewrite random parts of the file in an endless loop.

Or even better, combine this testcase with some automated system crashes
and do an fsck every time the system comes back up. ;)

Jörn

-- 
Geld macht nicht glücklich.
Glück macht nicht satt.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/