linux-ext4 - Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131119122002.GB5339@gmail.com>
Date:	Tue, 19 Nov 2013 20:20:02 +0800
From:	Zheng Liu <gnehzuil.liu@...il.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Christoph Hellwig <hch@...radead.org>,
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
	xfs@....sgi.com
Subject: Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio
 writes

On Tue, Nov 19, 2013 at 11:01:12PM +1100, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 03:18:26AM -0800, Christoph Hellwig wrote:
> > On Tue, Nov 19, 2013 at 07:19:47PM +0800, Zheng Liu wrote:
> > > Yes, I know that XFS has a shared/exclusive lock.  I guess that is why
> > > it can pass the test.  But another question is why xfs fails when we do
> > > some append dio writes with doing buffered read.
> > 
> > Can you provide a test case for that issue?
> 
> For XFS, appending direct IO writes only hold the IOLOCK exclusive
> for as long as it takes to guarantee that the the region between the
> old EOF and the new EOF is full of zeros before it is demoted.  i.e.
> once the region is guaranteed not to expose stale data, the
> exclusive IO lock is demoted to to a shared lock and a buffered read
> is then allowed to proceed concurrently with the DIO write.
> 
> Hence even appending writes occur concurrently with buffered reads,
> and if the read overlaps the block at the old EOF then the page
> brought into the page cache will have zeros in it.
> 
> FWIW, there's a wonderful comment in generic_file_direct_write()
> that pretty much covers this case:
> 
>         /*
>          * Finally, try again to invalidate clean pages which might have been
>          * cached by non-direct readahead, or faulted in by get_user_pages()
>          * if the source of the write was an mmap'ed region of the file
>          * we're writing.  Either one is a pretty crazy thing to do,
>          * so we don't support it 100%.  If this invalidation
>          * fails, tough, the write still worked...
>          */
> 
> The kernel code simply does not have the exclusion mechanisms to
> make concurrent buffered and direct IO robust. This is one of the
> problems (amongst many) that we've been looking to solve with an VFS
> level IO range lock of some kind....

Thanks for pointing it out.

                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html