linux-kernel - Re: [syzbot] [mm?] kernel BUG in try_to_unmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b9a43f6d-1865-4074-b91c-a5bd7e10f2a9@redhat.com>
Date: Thu, 5 Jun 2025 08:37:19 +0200
From: David Hildenbrand <david@...hat.com>
To: syzbot <syzbot+3b220254df55d8ca8a61@...kaller.appspotmail.com>,
 Liam.Howlett@...cle.com, akpm@...ux-foundation.org, harry.yoo@...cle.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 lorenzo.stoakes@...cle.com, riel@...riel.com,
 syzkaller-bugs@...glegroups.com, vbabka@...e.cz, Jens Axboe
 <axboe@...nel.dk>, Catalin Marinas <catalin.marinas@....com>,
 Jinjiang Tu <tujinjiang@...wei.com>
Subject: Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)

On 05.06.25 08:27, David Hildenbrand wrote:
> On 05.06.25 08:11, David Hildenbrand wrote:
>> On 05.06.25 07:38, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into for-kernelci
>>
>> Hmmm, another very odd page-table mapping related problem on that tree
>> found on arm64 only:
> 
> In this particular reproducer we seem to be having MADV_HUGEPAGE and
> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
> io_uring_register(IORING_REGISTER_BUFFERS).
> 
> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
> split a THP, while MADV_PAGEOUT tries paging it out.
> 
> IORING_REGISTER_BUFFERS ends up in
> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
> try coalescing buffers.
> 
> And something about THPs is not particularly happy :)
> 

Not sure if realted to io_uring.

unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.

When called from memory_failure(), we make sure to never call it on a large folio: WARN_ON(folio_test_large(folio));

However, from shrink_folio_list() we might call unmap_poisoned_folio() on a large folio, which doesn't work if it is still PMD-mapped. Maybe passing TTU_SPLIT_HUGE_PMD would fix it.


Likely the relevant commit is:

commit 1b0449544c6482179ac84530b61fc192a6527bfd
Author: Jinjiang Tu <tujinjiang@...wei.com>
Date:   Tue Mar 18 16:39:39 2025 +0800

     mm/vmscan: don't try to reclaim hwpoison folio
     
     Syzkaller reports a bug as follows:
     
     Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000
     Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users
     Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed
     page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e
     memcg:ffff0000dd6d9000
     anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
     raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9
     raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000
     page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))

CCing Jinjiang Tu

-- 
Cheers,

David / dhildenb