[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190918132719.GE31891@quack2.suse.cz>
Date: Wed, 18 Sep 2019 15:27:19 +0200
From: Jan Kara <jack@...e.cz>
To: yangerkun <yangerkun@...wei.com>
Cc: Jan Kara <jack@...e.cz>, tytso@....edu, linux-ext4@...r.kernel.org,
yi.zhang@...wei.com, houtao1@...wei.com
Subject: Re: [PATCH] ext4: fix a bug in ext4_wait_for_tail_page_commit
On Wed 18-09-19 21:09:00, yangerkun wrote:
> On 2019/9/18 18:45, Jan Kara wrote:
> > On Tue 17-09-19 16:48:14, yangerkun wrote:
> > > No need to wait when offset equals to 0. And it will trigger a bug since
> > > the latter __ext4_journalled_invalidatepage can free the buffers but leave
> > > page still dirty.
> > >
> > > [ 26.057508] ------------[ cut here ]------------
> > > [ 26.058531] kernel BUG at fs/ext4/inode.c:2134!
> > > ...
> > > [ 26.088130] Call trace:
> > > [ 26.088695] ext4_writepage+0x914/0xb28
> > > [ 26.089541] writeout.isra.4+0x1b4/0x2b8
> > > [ 26.090409] move_to_new_page+0x3b0/0x568
> > > [ 26.091338] __unmap_and_move+0x648/0x988
> > > [ 26.092241] unmap_and_move+0x48c/0xbb8
> > > [ 26.093096] migrate_pages+0x220/0xb28
> > > [ 26.093945] kernel_mbind+0x828/0xa18
> > > [ 26.094791] __arm64_sys_mbind+0xc8/0x138
> > > [ 26.095716] el0_svc_common+0x190/0x490
> > > [ 26.096571] el0_svc_handler+0x60/0xd0
> > > [ 26.097423] el0_svc+0x8/0xc
> > >
> > > Run below parallel can reproduce it easily(ext3):
> > > void main()
> > > {
> > > int fd, fd1, fd2, fd3, ret;
> > > void *addr;
> > > size_t length = 4096;
> > > int flags;
> > > off_t offset = 0;
> > > char *str = "12345";
> > >
> > > fd = open("a", O_RDWR | O_CREAT);
> > > assert(fd >= 0);
> > >
> > > ret = ftruncate(fd, length);
> > > assert(ret == 0);
> > >
> > > fd1 = open("a", O_RDWR | O_CREAT, -1);
> > > assert(fd1 >= 0);
> > >
> > > flags = 0xc00f;/*Journal data mode*/
> > > ret = ioctl(fd1, _IOW('f', 2, long), &flags);
> > > assert(ret == 0);
> > >
> > > fd2 = open("a", O_RDWR | O_CREAT);
> > > assert(fd2 >= 0);
> > >
> > > fd3 = open("a", O_TRUNC | O_NOATIME);
> > > assert(fd3 >= 0);
> > >
> > > addr = mmap(NULL, length, 0xe, 0x28013, fd2, offset);
> >
> > Ugh, these mmap flags look pretty bogus. Were they generated by some
> > fuzzer?
> Yeah, generated by syzkaller.
> >
> > > assert(addr != (void *)-1);
> > > memcpy(addr, str, 5);
> >
> > Also the O_TRUNC open above will truncate "a" to 0 so the mapping is
> > actually beyond i_size and this memcpy should fail with SIGBUS. So I'm
> > surprised your test program gets up to mbind()...
>
> We run the program parallel, sometimes will run as below:
>
> reproduce1 reproduce2
>
> ... | ...
> truncate to 4k |
> change to journal data mode |
> | memcpy(set page dirty)
> truncate to 0: |
> ext4_setattr: |
> ... |
> ext4_wait_for_tail_page_commit |
> | mbind(trigger bug)
> truncate_pagecache(clean dirty)| ...
> ... |
> Reproduce2 will mark page as dirty by memcpy, then mbind run between
> ext4_wait_for_tail_page_commit and truncate_pagecache in ext4_setattr can
> trigger the bug with page still be dirty but buffer head has been free.
Aha! Thanks for explanation. Makes sense. So I agree with your patch but we
also need to update the comment before the condition in
ext4_wait_for_tail_page_commit(). Something like:
If the page is fully truncated, we don't need to wait for any commit (and
we even should not as __ext4_journalled_invalidatepage() may strip all
buffers from the page but keep the page dirty which can then confuse e.g.
concurrent ext4_writepage() seeing dirty page without buffers). Also we
don't need to wait for any commit if all buffers in the page remain valid.
This is most beneficial for the common case of blocksize == PAGE_SIZE.
Honza
> > > ---
> > > fs/ext4/inode.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 006b7a2070bf..a9943ae4f74d 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -5479,7 +5479,7 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
> > > * do. We do the check mainly to optimize the common PAGE_SIZE ==
> > > * blocksize case
> > > */
> > > - if (offset > PAGE_SIZE - i_blocksize(inode))
> > > + if (!offset || offset > PAGE_SIZE - i_blocksize(inode))
> > > return;
> > > while (1) {
> > > page = find_lock_page(inode->i_mapping,
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists