[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170504091233.GA808@quack2.suse.cz>
Date: Thu, 4 May 2017 11:12:33 +0200
From: Jan Kara <jack@...e.cz>
To: Ross Zwisler <ross.zwisler@...ux.intel.com>
Cc: Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
Alexey Kuznetsov <kuznet@...tuozzo.com>,
Andrey Ryabinin <aryabinin@...tuozzo.com>,
Anna Schumaker <anna.schumaker@...app.com>,
Christoph Hellwig <hch@....de>,
Dan Williams <dan.j.williams@...el.com>,
"Darrick J. Wong" <darrick.wong@...cle.com>,
Eric Van Hensbergen <ericvh@...il.com>,
Jens Axboe <axboe@...nel.dk>,
Johannes Weiner <hannes@...xchg.org>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Latchesar Ionkov <lucho@...kov.net>,
linux-cifs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-nfs@...r.kernel.org,
linux-nvdimm@...ts.01.org, Matthew Wilcox <mawilcox@...rosoft.com>,
Ron Minnich <rminnich@...dia.gov>,
samba-technical@...ts.samba.org, Steve French <sfrench@...ba.org>,
Trond Myklebust <trond.myklebust@...marydata.com>,
v9fs-developer@...ts.sourceforge.net
Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads
On Mon 01-05-17 16:38:55, Ross Zwisler wrote:
> > So for now I'm still more inclined to just stay with the radix tree lock as
> > is and just fix up the locking as I suggest and go for larger rewrite only
> > if we can demonstrate further performance wins.
>
> Sounds good.
>
> > WRT your second patch, if we go with the locking as I suggest, it is enough
> > to unmap the whole range after invalidate_inode_pages2() has cleared radix
> > tree entries (*) which will be much cheaper (for large writes) than doing
> > unmapping entry by entry.
>
> I'm still not convinced that it is safe to do the unmap in a separate step. I
> see your point about it being expensive to do a rmap walk to unmap each entry
> in __dax_invalidate_mapping_entry(), but I think we might need to because the
> unmap is part of the contract imposed by invalidate_inode_pages2_range() and
> invalidate_inode_pages2(). This exists in the header comment above each:
>
> * Any pages which are found to be mapped into pagetables are unmapped prior
> * to invalidation.
>
> If you look at the usage of invalidate_inode_pages2_range() in
> generic_file_direct_write() for example (which I realize we won't call for a
> DAX inode, but still), I think that it really does rely on the fact that
> invalidated pages are unmapped, right? If it didn't, and hole pages were
> mapped, the hole pages could remain mapped while a direct I/O write allocated
> blocks and then wrote real data.
>
> If we really want to unmap the entire range at once, maybe it would have to be
> done in invalidate_inode_pages2_range(), after the loop? My hesitation about
> this is that we'd be leaking yet more DAX special casing up into the
> mm/truncate.c code.
>
> Or am I missing something?
No, my thinking was to put the invalidation at the end of
invalidate_inode_pages2_range(). I agree it means more special-casing for
DAX in mm/truncate.c.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists