linux-kernel - Re: [PATCH] rd: Mark ramdisk buffers heads dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1192661889.15717.27.camel@think.oraclecorp.com>
Date:	Wed, 17 Oct 2007 18:58:09 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Christian Borntraeger <borntraeger@...ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Nick Piggin <nickpiggin@...oo.com.au>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	"Theodore Ts'o" <tytso@....edu>, stable@...nel.org
Subject: Re: [PATCH] rd: Mark ramdisk buffers heads dirty

On Wed, 2007-10-17 at 15:30 -0600, Eric W. Biederman wrote:
> Chris Mason <chris.mason@...cle.com> writes:
> 
> >> Thinking about it.  I don't believe anyone has ever intentionally built
> >> a filesystem tool that depends on being able to modify a file systems
> >> metadata buffer heads while the filesystem is running, and doing that
> >> would seem to be fragile as it would require a lot of cooperation
> >> between the tool and the filesystem about how the filesystem uses and
> >> implement things.
> >> 
> >
> > That's right.  For example, ext2 is doing directories in the page cache
> > of the directory inode, so there's a cache alias between the block
> > device page cache and the directory inode page cache.
> >
> >> Now I guess I need to see how difficult a patch would be to give
> >> filesystems magic inodes to keep their metadata buffer heads in.
> >
> > Not hard, the block device inode is already a magic inode for metadata
> > buffer heads.  You could just make another one attached to the bdev.
> >
> > But, I don't think I fully understand the problem you're trying to
> > solve?
> 
> 
> So the start:
> When we write buffers from the buffer cache we clear buffer_dirty
> but not PageDirty
>
> So try_to_free_buffers() will mark any page with clean buffer_heads
> that is not clean itself clean.
> 
> The ramdisk set pages dirty to keep them from being removed from the
> page cache, just like ramfs.
> 
So, the problem is using the Dirty bit to indicate pinned.  You're
completely right that our current setup of buffer heads and pages and
filesystem metadata is complex and difficult.

But, moving the buffer heads off of the page cache pages isn't going to
make it any easier to use dirty as pinned, especially in the face of
buffer_head users for file data pages.

You've already seen Nick fsblock code, but you can see my general
approach to replacing buffer heads here:

http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h

(alpha quality implementation in extent_map.c and users in inode.c)  The
basic idea is to do extent based record keeping for mapping and state of
things in the filesystem, and to avoid attaching these things to the
page.

> Unfortunately when those dirty ramdisk pages get buffers on them and
> those buffers all go clean and we are trying to reclaim buffer_heads
> we drop those pages from the page cache.   Ouch!
> 
> We can fix the ramdisk by setting making certain that buffer_heads
> on ramdisk pages stay dirty as well.  The problem is this breaks
> filesystems like reiserfs and ext3 that expect to be able to make 
> buffer_heads clean sometimes.
> 
> There are other ways to solve this for ramdisks, such as changing
> where ramdisks are stored.  However fixing the ramdisks this way
> still leaves the general problem that there are other paths to the
> filesystem metadata buffers, and those other paths cause the code
> to be complicated and buggy.
> 
> So I'm trying to see if we can untangle this Gordian knot, so the
> code because more easily maintainable.  
> 

Don't get me wrong, I'd love to see a simple and coherent fix for what
reiserfs and ext3 do with buffer head state, but I think for the short
term you're best off pinning the ramdisk pages via some other means.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/