[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131119114547.GB4782@gmail.com>
Date: Tue, 19 Nov 2013 19:45:47 +0800
From: Zheng Liu <gnehzuil.liu@...il.com>
To: Dmitry Monakhov <dmonakhov@...nvz.org>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio
writes
On Tue, Nov 19, 2013 at 02:54:49PM +0400, Dmitry Monakhov wrote:
> On Tue, 19 Nov 2013 17:53:03 +0800, Zheng Liu <gnehzuil.liu@...il.com> wrote:
[...]
> > As we expected, we should read nothing or data with 'a'. But now we
> > read data with '0'. I take a closer look at the code and it seems that
> > there is a bug in vfs. Let me describe my found here.
> >
> > reader writer
> > generic_file_aio_write()
> > ->__generic_file_aio_write()
> > ->generic_file_direct_write()
> > generic_file_aio_read()
> > ->do_generic_file_read()
> > [fallback to buffered read]
> >
> > ->find_get_page()
> > ->page_cache_sync_readahead()
> > ->find_get_page()
> > [in find_page label, we couldn't find a
> > page before and after calling
> > page_cache_sync_readahead(). So go to
> > no_cached_page label]
> >
> > ->page_cache_alloc_cold()
> > ->add_to_page_cache_lru()
> > [in no_cached_page label, we alloc a page
> > and goto readpage label.]
> >
> > ->aops->readpage()
> > [in readpage label, readpage() callback
> > is called and mpage_readpage() return a
> > zero-filled page (e.g. ext3/4), and go
> > to page_ok label]
> >
> > ->a_ops->direct_IO()
> > ->i_size_write()
> > [we enlarge the i_size]
> >
> > Here we check i_size
> > [in page_ok label, we check i_size but
> > it has been enlarged. Thus, we pass
> > the check and return a zero-filled page]
> >
> > I attach a patch below to fix this problem in vfs. However, to be honest, the
> > fix is very dirty. But frankly I haven't had a good solution until now. So I
> > send this mail to talk about this problem. Please let me know if I miss
> > something.
> Looks sane because in orrder to get correct result we have to read
> variables in opposite order in comparison to update order.
> >
> > Any comment and idea are always welcome.
> >
> > Thanks,
> > - Zheng
> >
> > From: Zheng Liu <wenqing.lz@...bao.com>
> >
> > Subject: [PATCH] vfs: check i_size at the beginning of do_generic_file_read()
> >
> > Signed-off-by: Zheng Liu <wenqing.lz@...bao.com>
> > ---
> > mm/filemap.c | 13 +++++++++++--
> > 1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 1e6aec4..9de2ad8 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1107,6 +1107,8 @@ static void do_generic_file_read(struct file *filp, loff_t *ppos,
> > pgoff_t index;
> > pgoff_t last_index;
> > pgoff_t prev_index;
> > + pgoff_t end_index;
> > + loff_t isize;
> > unsigned long offset; /* offset into pagecache page */
> > unsigned int prev_offset;
> > int error;
> > @@ -1117,10 +1119,17 @@ static void do_generic_file_read(struct file *filp, loff_t *ppos,
> > last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
> > offset = *ppos & ~PAGE_CACHE_MASK;
> >
> > + /*
> > + * We must check i_size at the beginning of this function to avoid to return
> > + * zero-filled page to userspace when the application does append dio writes.
> > + */
> > + isize = i_size_read(inode);
> > + end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
> > + if (unlikely(!isize || index > end_index))
> > + goto out;
> > +
> This not sufficient because protect from append case only.
> following scenario still can result in zefo-filed pages
If I understand correctly, it couldn't happen.
> Let inode has i_size=8192
> task1 task2
> pread(, off=4096,sz=4096) truncate(4096)
> ->do_generic_file_read()
> check i_size OK
> ->i_size_write()
> ->truncate_pagecache()
> ->page_cache_alloc_cold
> ->aops->readpage() ->ZERO
> pwrite(off=4096,sz=4096)
> ->generic_file_direct_write()
->invalidate_inode_pages2_range()
->direct_IO()
->invalidate_inode_pages2_range()
So the page should be invalidated
- Zheng
> ->i_size_write()
> check i_size OK
> > for (;;) {
> > struct page *page;
> > - pgoff_t end_index;
> > - loff_t isize;
> > unsigned long nr, ret;
> >
> > cond_resched();
> > --
> > 1.7.9.7
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists