[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.00.0801231248060.2803@woody.linux-foundation.org>
Date: Wed, 23 Jan 2008 13:00:57 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Miklos Szeredi <miklos@...redi.hu>
cc: a.p.zijlstra@...llo.nl, salikhmetov@...il.com, linux-mm@...ck.org,
jakob@...hought.net, linux-kernel@...r.kernel.org,
valdis.kletnieks@...edu, riel@...hat.com, ksm@...dk,
staubach@...hat.com, jesper.juhl@...il.com,
akpm@...ux-foundation.org, protasnb@...il.com,
r.e.wolff@...wizard.nl, hidave.darkstar@...il.com,
hch@...radead.org
Subject: Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in
sys_msync()
On Wed, 23 Jan 2008, Miklos Szeredi wrote:
>
> > [ Side note: it is quite possible that we should not do the
> > SYNC_FILE_RANGE_WAIT_BEFORE on MS_ASYNC, and just skip over pages that
> > are busily under writeback already.
>
> MS_ASYNC is not supposed to wait, so SYNC_FILE_RANGE_WAIT_BEFORE
> probably should not be used in that case.
Well, that would leave the page dirty (including in the page tables) if it
was under page writeback when the MS_ASYNC happened.
So I agree, we shouldn't necessarily wait, but if we want the page tables
to be cleaned, right now we need to.
> What would be perfect, is if we had a sync mode, that on encountering
> a page currently under writeback, would just do a page_mkclean() on
> it, so we still receive a page fault next time one of the mappings is
> dirtied, so the times can be updated.
>
> Would there be any difficulties with that?
It would require fairly invasive changes. Right now the actual page
writeback does effectively:
...
if (wbc->sync_mode != WB_SYNC_NONE)
wait_on_page_writeback(page);
if (PageWriteback(page) ||
!clear_page_dirty_for_io(page)) {
unlock_page(page);
continue;
}
ret = (*writepage)(page, wbc, data);
...
and that "clear_page_dirty_for_io()" really does clear *all* the dirty
bits, so we absolutely must start writepage() when we have done that. And
that, in turn, requires that we're not already under writeback.
Is it possible to fix? Sure. We'd have to split up
clear_page_dirty_for_io() to do it, and do the
if (mapping && mapping_cap_account_dirty(mapping))
..
part first (before the PageWriteback() tests), and then doing the
if (TestClearPageDirty(page))
...
parts later (after checking that that we're not under page-writeback).
So it's not horribly hard, but it's kind of a separate issue right now.
And while the *generic* page-writeback is easy enough to fix, I worry
about low-level filesystems that have their own "writepages()"
implementation. They could easily get that wrong.
So right now it seems that waiting for writeback to finish is the right
and safe thing to do (and even so, I'm not actually willing to commit my
suggested patch in 2.6.24, I think this needs more thinking about)
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists