linux-kernel - Re: [syzbot] [btrfs?] kernel BUG in __folio_start

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b57b3d18-7a70-4efa-a356-809c6ab29c02@suse.com>
Date: Mon, 25 Nov 2024 11:00:40 +1030
From: Qu Wenruo <wqu@...e.com>
To: Matthew Wilcox <willy@...radead.org>,
 syzbot <syzbot+aac7bff85be224de5156@...kaller.appspotmail.com>
Cc: akpm@...ux-foundation.org, clm@...com, dsterba@...e.com,
 josef@...icpanda.com, linux-btrfs@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [btrfs?] kernel BUG in __folio_start_writeback



在 2024/11/25 07:56, Matthew Wilcox 写道:
> On Sun, Nov 24, 2024 at 05:45:18AM -0800, syzbot wrote:
>>
>>   __fput+0x5ba/0xa50 fs/file_table.c:458
>>   task_work_run+0x24f/0x310 kernel/task_work.c:239
>>   resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
>>   exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
>>   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>>   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>>   syscall_exit_to_user_mode+0x13f/0x340 kernel/entry/common.c:218
>>   do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
>>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> This is:
> 
>          VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
> 
> ie we've called __folio_start_writeback() on a folio which is already
> under writeback.
> 
> Higher up in the trace, we have the useful information:
> 
>   page: refcount:6 mapcount:0 mapping:ffff888077139710 index:0x3 pfn:0x72ae5
>   memcg:ffff888140adc000
>   aops:btrfs_aops ino:105 dentry name(?):"file2"
>   flags: 0xfff000000040ab(locked|waiters|uptodate|lru|private|writeback|node=0|zone=1|lastcpupid=0x7ff)
>   raw: 00fff000000040ab ffffea0001c8f408 ffffea0000939708 ffff888077139710
>   raw: 0000000000000003 0000000000000001 00000006ffffffff ffff888140adc000
>   page dumped because: VM_BUG_ON_FOLIO(folio_test_writeback(folio))
>   page_owner tracks the page as allocated
> 
> The interesting part of the page_owner stacktrace is:
> 
>    filemap_alloc_folio_noprof+0xdf/0x500
>    __filemap_get_folio+0x446/0xbd0
>    prepare_one_folio+0xb6/0xa20
>    btrfs_buffered_write+0x6bd/0x1150
>    btrfs_direct_write+0x52d/0xa30
>    btrfs_do_write_iter+0x2a0/0x760
>    do_iter_readv_writev+0x600/0x880
>    vfs_writev+0x376/0xba0
> 
> (ie not very interesting)
> 
>> Workqueue: btrfs-delalloc btrfs_work_helper
>> RIP: 0010:__folio_start_writeback+0xc06/0x1050 mm/page-writeback.c:3119
>> Call Trace:
>>   <TASK>
>>   process_one_folio fs/btrfs/extent_io.c:187 [inline]
>>   __process_folios_contig+0x31c/0x540 fs/btrfs/extent_io.c:216
>>   submit_one_async_extent fs/btrfs/inode.c:1229 [inline]
>>   submit_compressed_extents+0xdb3/0x16e0 fs/btrfs/inode.c:1632
>>   run_ordered_work fs/btrfs/async-thread.c:245 [inline]
>>   btrfs_work_helper+0x56b/0xc50 fs/btrfs/async-thread.c:324
>>   process_one_work kernel/workqueue.c:3229 [inline]
> 
> This looks like a race?
> 
> process_one_folio() calls
> btrfs_folio_clamp_set_writeback calls
> btrfs_subpage_set_writeback:
> 
>          spin_lock_irqsave(&subpage->lock, flags);
>          bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits)
> ;
>          if (!folio_test_writeback(folio))
>                  folio_start_writeback(folio);
>          spin_unlock_irqrestore(&subpage->lock, flags);
> 
> so somebody else set writeback after we tested for writeback here.

The test VM is using X86_64, thus we won't go into the subpage routine, 
but directly call folio_start_writeback().

> 
> One thing that comes to mind is that _usually_ we take folio_lock()
> first, then start writeback, then call folio_unlock() and btrfs isn't
> doing that here (afaict).  Maybe that's not the source of the bug?

We still hold the folio locked, do submission then unlock.

You can check extent_writepage(), where at the entrance we check if the 
folio is still locked.
Then inside extent_writepage_io() we do the submission, setting the 
folio writeback inside submit_one_sector().
Eventually unlock the folio at the end of extent_writepage(), that's for 
the uncompressed writes.

There are a lot of special handling for async submission (compression), 
but it  still holds the folio locked, do compression and submission, and 
unlock, just all in another thread (this case).

So it looks like something is wrong when transferring the ownership of 
the page cache folios to the compression path, or some not properly 
handled error path.

Unfortunately I'm not really able to reproduce the case using the 
reproducer...

Thanks,
Qu



> 
> If it is, should we have a VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio)
> in __folio_start_writeback()?  Or is there somewhere that can't lock the
> folio before starting writeback?
>