linux-ext4 - Re: [PATCH v5 4/7] dax: add support for fsync/sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151222235123.GA24124@linux.intel.com>
Date:	Tue, 22 Dec 2015 16:51:23 -0700
From:	Ross Zwisler <ross.zwisler@...ux.intel.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Theodore Ts'o <tytso@....edu>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@...ger.ca>,
	Dave Chinner <david@...morbit.com>,
	Ingo Molnar <mingo@...hat.com>, Jan Kara <jack@...e.com>,
	Jeff Layton <jlayton@...chiereds.net>,
	Matthew Wilcox <willy@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org, linux-nvdimm@...1.01.org, x86@...nel.org,
	xfs@....sgi.com, Dan Williams <dan.j.williams@...el.com>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v5 4/7] dax: add support for fsync/sync

On Tue, Dec 22, 2015 at 02:46:25PM -0800, Andrew Morton wrote:
> On Fri, 18 Dec 2015 22:22:17 -0700 Ross Zwisler <ross.zwisler@...ux.intel.com> wrote:
> 
> > To properly handle fsync/msync in an efficient way DAX needs to track dirty
> > pages so it is able to flush them durably to media on demand.
> > 
> > The tracking of dirty pages is done via the radix tree in struct
> > address_space.  This radix tree is already used by the page writeback
> > infrastructure for tracking dirty pages associated with an open file, and
> > it already has support for exceptional (non struct page*) entries.  We
> > build upon these features to add exceptional entries to the radix tree for
> > DAX dirty PMD or PTE pages at fault time.
> 
> I'm getting a few rejects here against other pending changes.  Things
> look OK to me but please do runtime test the end result as it resides
> in linux-next.  Which will be next year.

Sounds good.  I'm hoping to soon send out an updated version of this series
which merges with Dan's changes to dax.c.  Thank you for pulling these into
-mm.

> --- a/fs/dax.c~dax-add-support-for-fsync-sync-fix
> +++ a/fs/dax.c
> @@ -383,10 +383,8 @@ static void dax_writeback_one(struct add
>  	struct radix_tree_node *node;
>  	void **slot;
>  
> -	if (type != RADIX_DAX_PTE && type != RADIX_DAX_PMD) {
> -		WARN_ON_ONCE(1);
> +	if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD))
>  		return;
> -	}

This is much cleaner, thanks.  I'll make this change throughout my set.

> > +/*
> > + * Flush the mapping to the persistent domain within the byte range of [start,
> > + * end]. This is required by data integrity operations to ensure file data is
> > + * on persistent storage prior to completion of the operation.
> > + */
> > +void dax_writeback_mapping_range(struct address_space *mapping, loff_t start,
> > +		loff_t end)
> > +{
> > +	struct inode *inode = mapping->host;
> > +	pgoff_t indices[PAGEVEC_SIZE];
> > +	pgoff_t start_page, end_page;
> > +	struct pagevec pvec;
> > +	void *entry;
> > +	int i;
> > +
> > +	if (inode->i_blkbits != PAGE_SHIFT) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> 
> again
> 
> > +	rcu_read_lock();
> > +	entry = radix_tree_lookup(&mapping->page_tree, start & PMD_MASK);
> > +	rcu_read_unlock();
> 
> What stabilizes the memory at *entry after rcu_read_unlock()?

Nothing in this function.  We use the entry that is currently in the tree to
know whether or not to expand the range of offsets that we need to flush.
Even if we are racing with someone, expanding our flushing range is
non-destructive.

We get a list of entries based on what is dirty later in this function via
find_get_entries_tag(), and before we take any action on those entries we
re-verify them while holding the tree_lock in dax_writeback_one().

The next version of this series will have updated version of this code which
also accounts for block device removal via dax_map_atomic() inside of
dax_writeback_one().
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html