[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d6eef0f1-bb89-ba55-53d4-e72d45d75cd3@huawei.com>
Date: Thu, 19 Sep 2019 09:29:26 +0800
From: yangerkun <yangerkun@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: <tytso@....edu>, <linux-ext4@...r.kernel.org>,
<yi.zhang@...wei.com>, <houtao1@...wei.com>
Subject: Re: [PATCH] ext4: fix a bug in ext4_wait_for_tail_page_commit
On 2019/9/18 21:27, Jan Kara wrote:
> On Wed 18-09-19 21:09:00, yangerkun wrote:
>> On 2019/9/18 18:45, Jan Kara wrote:
>>> On Tue 17-09-19 16:48:14, yangerkun wrote:
>>>> No need to wait when offset equals to 0. And it will trigger a bug since
>>>> the latter __ext4_journalled_invalidatepage can free the buffers but leave
>>>> page still dirty.
>>>>
>>>> [ 26.057508] ------------[ cut here ]------------
>>>> [ 26.058531] kernel BUG at fs/ext4/inode.c:2134!
>>>> ...
>>>> [ 26.088130] Call trace:
>>>> [ 26.088695] ext4_writepage+0x914/0xb28
>>>> [ 26.089541] writeout.isra.4+0x1b4/0x2b8
>>>> [ 26.090409] move_to_new_page+0x3b0/0x568
>>>> [ 26.091338] __unmap_and_move+0x648/0x988
>>>> [ 26.092241] unmap_and_move+0x48c/0xbb8
>>>> [ 26.093096] migrate_pages+0x220/0xb28
>>>> [ 26.093945] kernel_mbind+0x828/0xa18
>>>> [ 26.094791] __arm64_sys_mbind+0xc8/0x138
>>>> [ 26.095716] el0_svc_common+0x190/0x490
>>>> [ 26.096571] el0_svc_handler+0x60/0xd0
>>>> [ 26.097423] el0_svc+0x8/0xc
>>>>
>>>> Run below parallel can reproduce it easily(ext3):
>>>> void main()
>>>> {
>>>> int fd, fd1, fd2, fd3, ret;
>>>> void *addr;
>>>> size_t length = 4096;
>>>> int flags;
>>>> off_t offset = 0;
>>>> char *str = "12345";
>>>>
>>>> fd = open("a", O_RDWR | O_CREAT);
>>>> assert(fd >= 0);
>>>>
>>>> ret = ftruncate(fd, length);
>>>> assert(ret == 0);
>>>>
>>>> fd1 = open("a", O_RDWR | O_CREAT, -1);
>>>> assert(fd1 >= 0);
>>>>
>>>> flags = 0xc00f;/*Journal data mode*/
>>>> ret = ioctl(fd1, _IOW('f', 2, long), &flags);
>>>> assert(ret == 0);
>>>>
>>>> fd2 = open("a", O_RDWR | O_CREAT);
>>>> assert(fd2 >= 0);
>>>>
>>>> fd3 = open("a", O_TRUNC | O_NOATIME);
>>>> assert(fd3 >= 0);
>>>>
>>>> addr = mmap(NULL, length, 0xe, 0x28013, fd2, offset);
>>>
>>> Ugh, these mmap flags look pretty bogus. Were they generated by some
>>> fuzzer?
>> Yeah, generated by syzkaller.
>>>
>>>> assert(addr != (void *)-1);
>>>> memcpy(addr, str, 5);
>>>
>>> Also the O_TRUNC open above will truncate "a" to 0 so the mapping is
>>> actually beyond i_size and this memcpy should fail with SIGBUS. So I'm
>>> surprised your test program gets up to mbind()...
>>
>> We run the program parallel, sometimes will run as below:
>>
>> reproduce1 reproduce2
>>
>> ... | ...
>> truncate to 4k |
>> change to journal data mode |
>> | memcpy(set page dirty)
>> truncate to 0: |
>> ext4_setattr: |
>> ... |
>> ext4_wait_for_tail_page_commit |
>> | mbind(trigger bug)
>> truncate_pagecache(clean dirty)| ...
>> ... |
>> Reproduce2 will mark page as dirty by memcpy, then mbind run between
>> ext4_wait_for_tail_page_commit and truncate_pagecache in ext4_setattr can
>> trigger the bug with page still be dirty but buffer head has been free.
>
> Aha! Thanks for explanation. Makes sense. So I agree with your patch but we
> also need to update the comment before the condition in
> ext4_wait_for_tail_page_commit(). Something like:
>
> If the page is fully truncated, we don't need to wait for any commit (and
> we even should not as __ext4_journalled_invalidatepage() may strip all
> buffers from the page but keep the page dirty which can then confuse e.g.
> concurrent ext4_writepage() seeing dirty page without buffers). Also we
> don't need to wait for any commit if all buffers in the page remain valid.
> This is most beneficial for the common case of blocksize == PAGE_SIZE.
I will add this comment and reorganize the patch. Thanks a lot!
>
> Honza
>
>>>> ---
>>>> fs/ext4/inode.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>> index 006b7a2070bf..a9943ae4f74d 100644
>>>> --- a/fs/ext4/inode.c
>>>> +++ b/fs/ext4/inode.c
>>>> @@ -5479,7 +5479,7 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
>>>> * do. We do the check mainly to optimize the common PAGE_SIZE ==
>>>> * blocksize case
>>>> */
>>>> - if (offset > PAGE_SIZE - i_blocksize(inode))
>>>> + if (!offset || offset > PAGE_SIZE - i_blocksize(inode))
>>>> return;
>>>> while (1) {
>>>> page = find_lock_page(inode->i_mapping,
Powered by blists - more mailing lists