linux-kernel - Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.00.0801231248060.2803@woody.linux-foundation.org>
Date:	Wed, 23 Jan 2008 13:00:57 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Miklos Szeredi <miklos@...redi.hu>
cc:	a.p.zijlstra@...llo.nl, salikhmetov@...il.com, linux-mm@...ck.org,
	jakob@...hought.net, linux-kernel@...r.kernel.org,
	valdis.kletnieks@...edu, riel@...hat.com, ksm@...dk,
	staubach@...hat.com, jesper.juhl@...il.com,
	akpm@...ux-foundation.org, protasnb@...il.com,
	r.e.wolff@...wizard.nl, hidave.darkstar@...il.com,
	hch@...radead.org
Subject: Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in
 sys_msync()



On Wed, 23 Jan 2008, Miklos Szeredi wrote:
>
> > [ Side note: it is quite possible that we should not do the 
> >   SYNC_FILE_RANGE_WAIT_BEFORE on MS_ASYNC, and just skip over pages that 
> >   are busily under writeback already.
> 
> MS_ASYNC is not supposed to wait, so SYNC_FILE_RANGE_WAIT_BEFORE
> probably should not be used in that case.

Well, that would leave the page dirty (including in the page tables) if it 
was under page writeback when the MS_ASYNC happened.

So I agree, we shouldn't necessarily wait, but if we want the page tables 
to be cleaned, right now we need to.

> What would be perfect, is if we had a sync mode, that on encountering
> a page currently under writeback, would just do a page_mkclean() on
> it, so we still receive a page fault next time one of the mappings is
> dirtied, so the times can be updated.
> 
> Would there be any difficulties with that?

It would require fairly invasive changes. Right now the actual page 
writeback does effectively:

			...
                        if (wbc->sync_mode != WB_SYNC_NONE)
                                wait_on_page_writeback(page);

                        if (PageWriteback(page) ||
                            !clear_page_dirty_for_io(page)) {
                                unlock_page(page);
                                continue;
                        }

                        ret = (*writepage)(page, wbc, data);
			...

and that "clear_page_dirty_for_io()" really does clear *all* the dirty 
bits, so we absolutely must start writepage() when we have done that. And 
that, in turn, requires that we're not already under writeback.

Is it possible to fix? Sure. We'd have to split up 
clear_page_dirty_for_io() to do it, and do the

	if (mapping && mapping_cap_account_dirty(mapping))
			..

part first (before the PageWriteback() tests), and then doing the

	if (TestClearPageDirty(page))
			...

parts later (after checking that that we're not under page-writeback).

So it's not horribly hard, but it's kind of a separate issue right now. 
And while the *generic* page-writeback is easy enough to fix, I worry 
about low-level filesystems that have their own "writepages()" 
implementation. They could easily get that wrong.

So right now it seems that waiting for writeback to finish is the right 
and safe thing to do (and even so, I'm not actually willing to commit my 
suggested patch in 2.6.24, I think this needs more thinking about)

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/