lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1234645893.4695.8.camel@laptop>
Date:	Sat, 14 Feb 2009 22:11:33 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Edward Shishkin <edward.shishkin@...il.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>, Ryan Hope <rmh3093@...il.com>,
	Randy Dunlap <randy.dunlap@...cle.com>,
	linux-kernel@...r.kernel.org,
	ReiserFS Mailing List <reiserfs-devel@...r.kernel.org>
Subject: Re: [patch 2/4] vfs: add set_page_dirty_notag

On Sat, 2009-02-14 at 16:11 +0300, Edward Shishkin wrote:
> Peter Zijlstra writes:
>  > On Fri, 2009-02-13 at 16:57 +0300, Edward Shishkin wrote:
>  > > >
>  > > > Eew, so reiser4 will totally side-step the regular vm inode
>  > > writeback
>  > > > paths -- or is this fixed by a more elaborate than usual
>  > > > a_ops->writepages() ?
>  > > >   
>  > > The second.
>  > > 
>  > > reiser4_writepages() catches the anonymous  (tagged)
>  > > pages, captures them mandatory, then commits all atoms
>  > > of the file.
>  > 
>  > OK, can you then make it painfully clear in the function comment, esp.
>  > since you export this function?
> 
> Hello.
> Does it look better?
> 
> Thanks,
> Edward.
> 
> This is a fixup for the following "todo":
> akpm wrote:
> > reiser4_set_page_dirty_internal() pokes around in VFS internals.
> > Use __set_page_dirty_no_buffers() or create a new library function
> > in mm/page-writeback.c.
> 
> Problem:
> 
> In accordance with reiser4 transactional model every dirty page
> should be "captured" by some atom. However, outside reiser4 context
> dirty page can not be captured in some cases, as it is accompanied
> with specific work (jnode creation, etc). Reiser4 recognizes such
> "anonymous" pages (i.e. pages that were dirtied outside of reiser4)
> by the tag PAGECACHE_TAG_DIRTY. Pages dirtied inside reiser4 context
> are not tagged at all: we don't need this. Indeed, once page is
> dirtied and captured, it is attached to a jnode (a special header
> to keep a track of transactions).
> 
> reiser4_set_page_dirty_internal() was the internal reiser4 function
> that set dirty bit without tagging the page. Having such internal
> function led to real problems (incorrect task io accounting, etc.
> because of not updating this internal "friend").
> 
> Solution:
> 
> The following patch adds a core library function that sets a dirty
> bit without tagging the page. It should be modified simultaneously
> with its "friends": __set_page_dirty_nobuffers, __set_page_dirty.
> 
> Signed-off-by: Edward Shishkin<edward.shishkin@...il.com>
> ---
>  include/linux/mm.h  |    1 +
>  mm/page-writeback.c |   40 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+)
> 
> --- mmotm.orig/include/linux/mm.h
> +++ mmotm/include/linux/mm.h
> @@ -841,6 +841,7 @@ int redirty_page_for_writepage(struct wr
>  				struct page *page);
>  int set_page_dirty(struct page *page);
>  int set_page_dirty_lock(struct page *page);
> +int set_page_dirty_notag(struct page *page);
>  int clear_page_dirty_for_io(struct page *page);
>  
>  extern unsigned long move_page_tables(struct vm_area_struct *vma,
> --- mmotm.orig/mm/page-writeback.c
> +++ mmotm/mm/page-writeback.c
> @@ -1248,6 +1248,46 @@ int __set_page_dirty_nobuffers(struct pa
>  EXPORT_SYMBOL(__set_page_dirty_nobuffers);
>  
>  /*
> + * Some filesystems, which don't use buffers, provide their own
> + * writeback means. And it can happen, that even dirty tag, which
> + * is used by generic methods is not needed. In this case it would
> + * be reasonably to use the following lightweight version of
> + * __set_page_dirty_nobuffers:
> + *
> + * Don't tag page as dirty in its radix tree, just set dirty bit
> + * and update the accounting.
> + * NOTE: This function also doesn't take care of races, i.e. the
> + * caller should guarantee that page can not be truncated.
> + */

Maybe something like

/*
 * set_page_dirty_notag() -- similar to __set_page_dirty_nobuffers()
 * except it doesn't tag the page dirty in the page-cache radix tree.
 * This means that the address space using this cannot use the regular
 * filemap ->writepages() helpers and must provide its own means of
 * tracking and finding dirty pages.
 *
 * NOTE: furthermore, this version also doesn't handle truncate races.
 */

> +int set_page_dirty_notag(struct page *page)
> +{
> +	struct address_space *mapping = page->mapping;
> +
> +	if (!TestSetPageDirty(page)) {
> +		WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
> +		if (mapping_cap_account_dirty(mapping)) {
> +		        /*
> +			 * Since we don't tag the page as dirty,
> +			 * acquiring the tree_lock is replaced
> +			 * with disabling preemption to protect
> +			 * per-cpu data used for accounting.
> +			 */

This should be local_irq_save(flags)

> +			preempt_disable();
> +			__inc_zone_page_state(page, NR_FILE_DIRTY);
> +			__inc_bdi_stat(mapping->backing_dev_info,
> +				       BDI_RECLAIMABLE);
> +			task_dirty_inc(current);
> +			task_io_account_write(PAGE_CACHE_SIZE);
> +			preempt_enable();

local_irq_restore()

These accounting functions rely on being atomic wrt interrupts.

> +		}
> +		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> +		return 1;
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL(set_page_dirty_notag);

How much performance gain do you see by avoiding that radix tree op? 

I take it the only reason you don't use the regular
__set_page_dirty_nobuffers() and just clear the tag when you do the
write-out by whatever alternative means you have to find the page, is
that it gains you some performance.

It would be good to have some numbers to judge this on.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ