linux-ext4 - Re: [PATCH] ext4: fix a bug in ext4_wait_for_tail_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190918104535.GC25056@quack2.suse.cz>
Date:   Wed, 18 Sep 2019 12:45:35 +0200
From:   Jan Kara <jack@...e.cz>
To:     yangerkun <yangerkun@...wei.com>
Cc:     tytso@....edu, jack@...e.cz, linux-ext4@...r.kernel.org,
        yi.zhang@...wei.com, houtao1@...wei.com
Subject: Re: [PATCH] ext4: fix a bug in ext4_wait_for_tail_page_commit

On Tue 17-09-19 16:48:14, yangerkun wrote:
> No need to wait when offset equals to 0. And it will trigger a bug since
> the latter __ext4_journalled_invalidatepage can free the buffers but leave
> page still dirty.
> 
> [   26.057508] ------------[ cut here ]------------
> [   26.058531] kernel BUG at fs/ext4/inode.c:2134!
> ...
> [   26.088130] Call trace:
> [   26.088695]  ext4_writepage+0x914/0xb28
> [   26.089541]  writeout.isra.4+0x1b4/0x2b8
> [   26.090409]  move_to_new_page+0x3b0/0x568
> [   26.091338]  __unmap_and_move+0x648/0x988
> [   26.092241]  unmap_and_move+0x48c/0xbb8
> [   26.093096]  migrate_pages+0x220/0xb28
> [   26.093945]  kernel_mbind+0x828/0xa18
> [   26.094791]  __arm64_sys_mbind+0xc8/0x138
> [   26.095716]  el0_svc_common+0x190/0x490
> [   26.096571]  el0_svc_handler+0x60/0xd0
> [   26.097423]  el0_svc+0x8/0xc
> 
> Run below parallel can reproduce it easily(ext3):
> void main()
> {
>         int fd, fd1, fd2, fd3, ret;
>         void *addr;
>         size_t length = 4096;
>         int flags;
>         off_t offset = 0;
>         char *str = "12345";
> 
>         fd = open("a", O_RDWR | O_CREAT);
>         assert(fd >= 0);
> 
>         ret = ftruncate(fd, length);
>         assert(ret == 0);
> 
>         fd1 = open("a", O_RDWR | O_CREAT, -1);
>         assert(fd1 >= 0);
> 
>         flags = 0xc00f;/*Journal data mode*/
>         ret = ioctl(fd1, _IOW('f', 2, long), &flags);
>         assert(ret == 0);
> 
>         fd2 = open("a", O_RDWR | O_CREAT);
>         assert(fd2 >= 0);
> 
>         fd3 = open("a", O_TRUNC | O_NOATIME);
>         assert(fd3 >= 0);
> 
>         addr = mmap(NULL, length, 0xe, 0x28013, fd2, offset);

Ugh, these mmap flags look pretty bogus. Were they generated by some
fuzzer?

>         assert(addr != (void *)-1);
>         memcpy(addr, str, 5);

Also the O_TRUNC open above will truncate "a" to 0 so the mapping is
actually beyond i_size and this memcpy should fail with SIGBUS. So I'm
surprised your test program gets up to mbind()...

>         mbind(addr, length, 0, 0, 0, 2);
> 
>         close(fd);
>         munmap(addr, length);
> }
> 
> Signed-off-by: yangerkun <yangerkun@...wei.com>

I agree that there's no need to wait for transaction commit when offset ==
0. So your patch is correct in that regard. What still escapes me is why
this is necessary. I have a feeling that it just papers over the real
problem.  You mention crash in ext4_writepage() because page is dirty but
has no buffers - but how come the page is dirty? If offset == 0 for a page,
truncate_inode_pages() should have cleaned PageDirty flag so the page
should never get to ext4_writepage() in the first place. Together with my
comments about the test case this is still a bit mystery to me... I guess
I'll try to reproduce this to understand this better.

								Honza

> ---
>  fs/ext4/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 006b7a2070bf..a9943ae4f74d 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5479,7 +5479,7 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
>  	 * do. We do the check mainly to optimize the common PAGE_SIZE ==
>  	 * blocksize case
>  	 */
> -	if (offset > PAGE_SIZE - i_blocksize(inode))
> +	if (!offset || offset > PAGE_SIZE - i_blocksize(inode))
>  		return;
>  	while (1) {
>  		page = find_lock_page(inode->i_mapping,
> -- 
> 2.17.2
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR