linux-kernel - Re: [syzbot] [mm?] kernel BUG in sanity_check_pinned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9b81f26-bf44-45af-8bec-60582696ef5c@redhat.com>
Date: Mon, 23 Jun 2025 14:47:48 +0200
From: David Hildenbrand <david@...hat.com>
To: Alexander Potapenko <glider@...gle.com>, axboe@...nel.dk
Cc: syzbot <syzbot+1d335893772467199ab6@...kaller.appspotmail.com>,
 akpm@...ux-foundation.org, catalin.marinas@....com, jgg@...pe.ca,
 jhubbard@...dia.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 peterx@...hat.com, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [mm?] kernel BUG in sanity_check_pinned_pages

On 23.06.25 14:22, David Hildenbrand wrote:
> On 23.06.25 12:10, David Hildenbrand wrote:
>> On 23.06.25 11:53, Alexander Potapenko wrote:
>>> On Mon, Jun 23, 2025 at 11:29 AM 'David Hildenbrand' via
>>> syzkaller-bugs <syzkaller-bugs@...glegroups.com> wrote:
>>>>
>>>> On 21.06.25 23:52, syzbot wrote:
>>>>> syzbot has found a reproducer for the following issue on:
>>>>>
>>>>> HEAD commit:    9aa9b43d689e Merge branch 'for-next/core' into for-kernelci
>>>>> git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1525330c580000
>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=27f179c74d5c35cd
>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=1d335893772467199ab6
>>>>> compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
>>>>> userspace arch: arm64
>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16d73370580000
>>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=160ef30c580000
>>>>
>>>> There is not that much magic in there, I'm afraid.
>>>>
>>>> fork() is only used to spin up guests, but before the memory region of
>>>> interest is actually allocated, IIUC. No threading code that races.
>>>>
>>>> IIUC, it triggers fairly fast on aarch64. I've left it running for a
>>>> while on x86_64 without any luck.
>>>>
>>>> So maybe this is really some aarch64-special stuff (pointer tagging?).
>>>>
>>>> In particular, there is something very weird in the reproducer:
>>>>
>>>>       syscall(__NR_madvise, /*addr=*/0x20a93000ul, /*len=*/0x4000ul,
>>>>               /*advice=MADV_HUGEPAGE|0x800000000*/ 0x80000000eul);
>>>>
>>>> advise is supposed to be a 32bit int. What does the magical
>>>> "0x800000000" do?
>>>
>>> I am pretty sure this is a red herring.
>>> Syzkaller sometimes mutates integer flags, even if the result makes no
>>> sense - because sometimes it can trigger interesting bugs.
>>> This `advice` argument will be discarded by is_valid_madvise(),
>>> resulting in -EINVAL.
>>
>> I thought the same, but likely the upper bits are discarded, and we end
>> up with __NR_madvise succeeding.
>>
>> The kernel config has
>>
>> 	CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
>>
>> So without MADV_HUGEPAGE, we wouldn't get a THP in the first place.
>>
>> So likely this is really just like dropping the "0x800000000"
>>
>> Anyhow, I managed to reproduce in the VM using the provided rootfs on
>> aarch64. It triggers immediately, so no races involved.
>>
>> Running the reproducer on a Fedora 42 debug-kernel in the hypervisor
>> does not trigger.
> 
> Simplified reproducer that does not depend on a race with the
> child process.
> 
> As expected previously, we have PAE cleared on the head page,
> because it is/was COW-shared with a child process.
> 
> We are registering more than one consecutive tail pages of that
> THP through iouring, GUP-pinning them. These pages are not
> COW-shared and, therefore, do not have PAE set.
> 
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include <sys/ioctl.h>
> #include <sys/mman.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <liburing.h>
> 
> int main(void)
> {
>           struct io_uring_params params = {
>                   .wq_fd = -1,
>           };
>           struct iovec iovec;
>           const size_t pagesize = getpagesize();
>           size_t size = 2048 * pagesize;
>           char *addr;
>           int fd;
> 
>           /* We need a THP-aligned area. */
>           addr = mmap((char *)0x20000000u, size, PROT_WRITE|PROT_READ,
>                       MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>           if (addr == MAP_FAILED) {
>                   perror("MAP_FIXED failed\n");
>                   return 1;
>           }
> 
>           if (madvise(addr, size, MADV_HUGEPAGE)) {
>                   perror("MADV_HUGEPAGE failed\n");
>                   return 1;
>           }
> 
>           /* Populate a THP. */
>           memset(addr, 0, size);
> 
>           /* COW-share only the first page ... */
>           if (madvise(addr + pagesize, size - pagesize, MADV_DONTFORK)) {
>                   perror("MADV_DONTFORK failed\n");
>                   return 1;
>           }
> 
>           /* ... using fork(). This will clear PAE on the head page. */
>           if (fork() == 0)
>                   exit(0);
> 
>           /* Setup iouring */
>           fd = syscall(__NR_io_uring_setup, 1024, &params);
>           if (fd < 0) {
>                   perror("__NR_io_uring_setup failed\n");
>                   return 1;
>           }
> 
>           /* Register (GUP-pin) two consecutive tail pages. */
>           iovec.iov_base = addr + pagesize;
>           iovec.iov_len = 2 * pagesize;
>           syscall(__NR_io_uring_register, fd, IORING_REGISTER_BUFFERS, &iovec, 1);
>           return 0;
> }
> 
> [  108.070381][   T14] kernel BUG at mm/gup.c:71!
> [  108.070502][   T14] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
> [  108.117202][   T14] Modules linked in:
> [  108.119105][   T14] CPU: 1 UID: 0 PID: 14 Comm: kworker/u32:1 Not tainted 6.16.0-rc2-syzkaller-g9aa9b43d689e #0 PREEMPT
> [  108.123672][   T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025
> [  108.127458][   T14] Workqueue: iou_exit io_ring_exit_work
> [  108.129812][   T14] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  108.133091][   T14] pc : sanity_check_pinned_pages+0x7cc/0x7d0
> [  108.135566][   T14] lr : sanity_check_pinned_pages+0x7cc/0x7d0
> [  108.138025][   T14] sp : ffff800097ac7640
> [  108.139859][   T14] x29: ffff800097ac7660 x28: dfff800000000000 x27: 1fffffbff80d3000
> [  108.143185][   T14] x26: 01ffc0000002007c x25: 01ffc0000002007c x24: fffffdffc0698000
> [  108.146599][   T14] x23: fffffdffc0698000 x22: ffff800097ac76e0 x21: 01ffc0000002007c
> [  108.150025][   T14] x20: 0000000000000000 x19: ffff800097ac76e0 x18: 00000000ffffffff
> [  108.153449][   T14] x17: 703e2d6f696c6f66 x16: ffff80008ae33808 x15: ffff700011ed61d4
> [  108.156892][   T14] x14: 1ffff00011ed61d4 x13: 0000000000000004 x12: ffffffffffffffff
> [  108.160267][   T14] x11: ffff700011ed61d4 x10: 0000000000ff0100 x9 : f6672ecf4f89d700
> [  108.163782][   T14] x8 : f6672ecf4f89d700 x7 : 0000000000000001 x6 : 0000000000000001
> [  108.167180][   T14] x5 : ffff800097ac6d58 x4 : ffff80008f727060 x3 : ffff80008054c348
> [  108.170807][   T14] x2 : 0000000000000000 x1 : 0000000100000000 x0 : 0000000000000061
> [  108.174205][   T14] Call trace:
> [  108.175649][   T14]  sanity_check_pinned_pages+0x7cc/0x7d0 (P)
> [  108.178138][   T14]  unpin_user_page+0x80/0x10c
> [  108.180189][   T14]  io_release_ubuf+0x84/0xf8
> [  108.182196][   T14]  io_free_rsrc_node+0x250/0x57c
> [  108.184345][   T14]  io_rsrc_data_free+0x148/0x298
> [  108.186493][   T14]  io_sqe_buffers_unregister+0x84/0xa0
> [  108.188991][   T14]  io_ring_ctx_free+0x48/0x480
> [  108.191057][   T14]  io_ring_exit_work+0x764/0x7d8
> [  108.193207][   T14]  process_one_work+0x7e8/0x155c
> [  108.195431][   T14]  worker_thread+0x958/0xed8
> [  108.197561][   T14]  kthread+0x5fc/0x75c
> [  108.199362][   T14]  ret_from_fork+0x10/0x20

FWIW, a slight cow.c selftest modification can trigger the same:

diff --git a/tools/testing/selftests/mm/cow.c 
b/tools/testing/selftests/mm/cow.c
index 4214070d03ce..50c538b47bb4 100644
--- a/tools/testing/selftests/mm/cow.c
+++ b/tools/testing/selftests/mm/cow.c
@@ -991,6 +991,8 @@ static void do_run_with_thp(test_fn fn, enum thp_run 
thp_run, size_t thpsize)
                         log_test_result(KSFT_FAIL);
                         goto munmap;
                 }
+               mem += pagesize;
+               size -= pagesize;
                 break;
         default:
                 assert(false);

-- 
Cheers,

David / dhildenb