linux-kernel - Re: [syzbot] [mm?] kernel BUG in try_to_unmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <37e77dc4-bc73-2710-088f-f7ec0c787caf@huawei.com>
Date: Thu, 5 Jun 2025 15:37:53 +0800
From: Jinjiang Tu <tujinjiang@...wei.com>
To: David Hildenbrand <david@...hat.com>, syzbot
	<syzbot+3b220254df55d8ca8a61@...kaller.appspotmail.com>,
	<Liam.Howlett@...cle.com>, <akpm@...ux-foundation.org>,
	<harry.yoo@...cle.com>, <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
	<lorenzo.stoakes@...cle.com>, <riel@...riel.com>,
	<syzkaller-bugs@...glegroups.com>, <vbabka@...e.cz>, Jens Axboe
	<axboe@...nel.dk>, Catalin Marinas <catalin.marinas@....com>
Subject: Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)


在 2025/6/5 14:37, David Hildenbrand 写道:
> On 05.06.25 08:27, David Hildenbrand wrote:
>> On 05.06.25 08:11, David Hildenbrand wrote:
>>> On 05.06.25 07:38, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into 
>>>> for-kernelci
>>>
>>> Hmmm, another very odd page-table mapping related problem on that tree
>>> found on arm64 only:
>>
>> In this particular reproducer we seem to be having MADV_HUGEPAGE and
>> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
>> io_uring_register(IORING_REGISTER_BUFFERS).
>>
>> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
>> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
>> split a THP, while MADV_PAGEOUT tries paging it out.
>>
>> IORING_REGISTER_BUFFERS ends up in
>> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
>> try coalescing buffers.
>>
>> And something about THPs is not particularly happy :)
>>
>
> Not sure if realted to io_uring.
>
> unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.
>
> When called from memory_failure(), we make sure to never call it on a 
> large folio: WARN_ON(folio_test_large(folio));
>
> However, from shrink_folio_list() we might call unmap_poisoned_folio() 
> on a large folio, which doesn't work if it is still PMD-mapped. Maybe 
> passing TTU_SPLIT_HUGE_PMD would fix it.
>
>
> Likely the relevant commit is:
>
> commit 1b0449544c6482179ac84530b61fc192a6527bfd
> Author: Jinjiang Tu <tujinjiang@...wei.com>
> Date:   Tue Mar 18 16:39:39 2025 +0800
>
>     mm/vmscan: don't try to reclaim hwpoison folio
>         Syzkaller reports a bug as follows:
>         Injecting memory failure for pfn 0x18b00e at process virtual 
> address 0x20ffd000
>     Memory failure: 0x18b00e: dirty swapcache page still referenced by 
> 2 users
>     Memory failure: 0x18b00e: recovery action for dirty swapcache 
> page: Failed
>     page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd 
> pfn:0x18b00e
>     memcg:ffff0000dd6d9000
>     anon flags: 
> 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
>     raw: 005ffffe00482011 dead000000000100 dead000000000122 
> ffff0000e232a7c9
>     raw: 0000000000020ffd 0000000000000000 00000002ffffffff 
> ffff0000dd6d9000
>     page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))
>
> CCing Jinjiang Tu

By the way, unmap_poisoned_folio() is called in do_migrate_range() too. the folio may be in lru and is a large folio.