[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1ejfuhoah.fsf@ebiederm.dsl.xmission.com>
Date: Tue, 16 Oct 2007 13:06:46 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Nick Piggin <nickpiggin@...oo.com.au>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Christian Borntraeger <borntraeger@...ibm.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Martin Schwidefsky <schwidefsky@...ibm.com>,
"Theodore Ts'o" <tytso@....edu>
Subject: Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Nick Piggin <nickpiggin@...oo.com.au> writes:
> On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote:
>> I have not observed this case but it is possible to get a dirty page
>> cache with clean buffer heads if we get a clean ramdisk page with
>> buffer heads generated by a filesystem calling __getblk and then write
>> to that page from user space through the block device. Then we just
>> need to hit the proper window and try_to_free_buffers() will mark that
>> page clean and eventually drop it. Ouch!
>>
>> To fix this use the generic __set_page_dirty_buffers in the ramdisk
>> code so that when we mark a page dirty we also mark it's buffer heads
>> dirty.
>
> Hmm, so we can also have some filesystems writing their own buffers
> out by hand (clear_buffer_dirty, submit buffer for IO). Other places
> will do similarly dodgy things with filesystem metadata
> (fsync_buffers_list, for example).
>
> So your buffers get cleaned again, then your pages get cleaned.
So I just took a little bit of time to look at and think about
the path you are referring to, and I don't see a problem.
The rule with the buffer dirty bit is that you first clear it
and then you submit the write. When the write request finally
makes it's way to rd.c we copy the data if necessary and call
set_page_dirty. Which will then mark the page and the buffer
dirty again.
In essence the ramdisk code just attempts to lock buffers in
ram by setting their dirty bit, just like we do for pages
in ramfs.
The only case where I see the dirty bit getting cleared without
submitting I/O is when dirty state doesn't matter, in which
case if we get a page full buffers all of whose data we don't care
about it is legitimate to drop the page.
As for ramdisk_writepage it probably makes sense to remove it,
as the generic code paths seem to work as well or better the
writepage method is NULL. However if we do keep it we should call
set_page_dirty there on general principles.
> While I said it was a good fix when I saw the patch earlier, I think
> it's not closing the entire hole, and as such, Christian's patch is
> probably the way to go for stable.
I thought through the logic in try_to_free_buffers and it actually
makes sense to me now. mark_buffer_dirty sets the page dirty bit
so dirty buffers reside on dirty pages. When we submit I/O we
aren't guaranteed to submit all of the dirty buffers for a page
at once, so we don't clear the page dirty bit. With the result
that we can end up with pages with the dirty bit set but all of
their buffers are clean.
Since we rather deliberately allow truly clean pages to be dropped
from the ramdisk overriding try_to_free_buffer_pages looks wrong
because then except for invalidate we can not remove buffers
from truly clean pages.
> For mainline, *if* we want to keep the old rd.c around at all, I
> don't see any harm in this patch so long as Christian's is merged
> as well. Sharing common code is always good.
I do agree that with the amount of code duplication in the buffer
cache that locking pages into the buffer cache seems very error prone,
and difficult to maintain. So rewriting rd.c to store it's pages
elsewhere sounds very promising.
While I can see Christian's patch as fixing the symptom. I have
a very hard time see it as correct. If we did something more
elaborate to replace try_to_free_buffer_pages such that
we could drop buffers from clean buffer cache pages when they
became such and so were only suppressing the drop the dirty
bit logic for pages that contain data we want to keep I would be ok
with it. Just totally skipping buffer head freeing just feels
wrong to me.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists