lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7d92ec18-ff8e-4929-8b9a-f0bf5c6d249f@bytedance.com>
Date: Mon, 4 Aug 2025 17:35:28 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: Barry Song <21cnbao@...il.com>, "Lai, Yi" <yi1.lai@...ux.intel.com>
Cc: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Barry Song <v-songbaohua@...o.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
 <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
 Suren Baghdasaryan <surenb@...gle.com>, Lokesh Gidra
 <lokeshgidra@...gle.com>, Tangquan Zheng <zhengtangquan@...o.com>,
 yi1.lai@...el.com
Subject: Re: [PATCH v4] mm: use per_vma lock for MADV_DONTNEED



On 8/4/25 5:15 PM, Barry Song wrote:
> On Mon, Aug 4, 2025 at 8:49 PM Lai, Yi <yi1.lai@...ux.intel.com> wrote:
>>
>> On Mon, Aug 04, 2025 at 10:30:45AM +0200, David Hildenbrand wrote:
>>> On 04.08.25 10:26, Qi Zheng wrote:
>>>>
>>>>
>>>> On 8/4/25 3:57 PM, David Hildenbrand wrote:
>>>>> On 04.08.25 02:58, Lai, Yi wrote:
>>>>>> Hi Barry Song,
>>>>>>
>>>>>> Greetings!
>>>>>>
>>>>>> I used Syzkaller and found that there is general protection fault in
>>>>>> __pte_offset_map_lock in linux-next next-20250801.
>>>>>>
>>>>>> After bisection and the first bad commit is:
>>>>>> "
>>>>>> a6fde7add78d mm: use per_vma lock for MADV_DONTNEED
>>>>>> "
>>>>>>
>>>>>> All detailed into can be found at:
>>>>>> https://github.com/laifryiee/syzkaller_logs/tree/
>>>>>> main/250803_193026___pte_offset_map_lock
>>>>>> Syzkaller repro code:
>>>>>> https://github.com/laifryiee/syzkaller_logs/tree/
>>>>>> main/250803_193026___pte_offset_map_lock/repro.c
>>>>>> Syzkaller repro syscall steps:
>>>>>> https://github.com/laifryiee/syzkaller_logs/tree/
>>>>>> main/250803_193026___pte_offset_map_lock/repro.prog
>>>>>> Syzkaller report:
>>>>>> https://github.com/laifryiee/syzkaller_logs/tree/
>>>>>> main/250803_193026___pte_offset_map_lock/repro.report
>>>>>> Kconfig(make olddefconfig):
>>>>>> https://github.com/laifryiee/syzkaller_logs/tree/
>>>>>> main/250803_193026___pte_offset_map_lock/kconfig_origin
>>>>>> Bisect info:
>>>>>> https://github.com/laifryiee/syzkaller_logs/tree/
>>>>>> main/250803_193026___pte_offset_map_lock/bisect_info.log
>>>>>> bzImage:
>>>>>> https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/
>>>>>> main/250803_193026___pte_offset_map_lock/bzImage_next-20250801
>>>>>> Issue dmesg:
>>>>>> https://github.com/laifryiee/syzkaller_logs/blob/
>>>>>> main/250803_193026___pte_offset_map_lock/next-20250801_dmesg.log
>>>>>
>>>>> Skimming over the reproducer, we seem to have racing MADV_DONTNEED and
>>>>> MADV_COLLAPSE on the same anon area, but the problem only shows up once
>>>>> we tear down that MM.
>>>>>
>>>>> If I would have to guess, I'd assume it's related to PT_RECLAIM
>>>>> reclaiming empty page tables during MADV_DONTNEED -- but the kconfig
>>>>> does not indicate that CONFIG_PT_RECLAIM was set.
>>>>
>>>> On the x86_64, if PT_RECLAIM is not manually disabled, PT_RECLAIM should
>>>> be enabled
>>>
>>> That's what I thought: but I was not able to spot it in the provided config
>>> [1].
>>>
>>> Or is that config *before* "make olfconfig"? confusing. I would want to see
>>> the actually used config.
>>>
>>>
>>>
>> My kernel compiling steps:
>> 1. copy kconfig_origin to kernel_source_folder/.config
>> 2. make olddefconfig
>> 3. make bzImage -jx
>>
>> I have also uploaded the actual .config during compiling.
>> [2] https://github.com/laifryiee/syzkaller_logs/blob/main/250803_193026___pte_offset_map_lock/.config
>> CONFIG_ARCH_SUPPORTS_PT_RECLAIM=y
>> CONFIG_PT_RECLAIM=y
> 
> Thanks! I can reproduce the issue within one second.

I also reproduced it locally.

BUG: Bad page map in process repro  pte:f000e987f000fea5 pmd:00000067
[22099.667758][T22301] addr:0000000020004000 vm_flags:00100077 
anon_vma:ffff8882c5b5fc98 mapping:0000000000000000 index:20004
[22099.671248][T22301] file:(null) fault:0x0 mmap:0x0 mmap_prepare: 0x0 
read_folio:0x0
[22099.673833][T22301] CPU: 15 UID: 0 PID: 22301 Comm: repro Tainted: G 
   B   W           6.16.0-rc4-next-20250704+ #200 PREEMPT(voluntary)
[22099.673838][T22301] Tainted: [B]=BAD_PAGE, [W]=WARN
[22099.673838][T22301] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.12.0-1 04/01/2014
[22099.673840][T22301] Call Trace:
[22099.673841][T22301]  <TASK>
[22099.673842][T22301]  dump_stack_lvl+0x53/0x70
[22099.673845][T22301]  print_bad_pte+0x178/0x220
[22099.673849][T22301]  vm_normal_page+0x8a/0xa0
[22099.673852][T22301]  unmap_page_range+0x5cb/0x1d40
[22099.673855][T22301]  ? flush_tlb_mm_range+0x21a/0x780
[22099.673859][T22301]  ? tlb_flush_mmu+0x30/0x1c0
[22099.673861][T22301]  unmap_vmas+0xab/0x160
[22099.673863][T22301]  exit_mmap+0xda/0x3c0
[22099.673868][T22301]  mmput+0x6e/0x130
[22099.673869][T22301]  do_exit+0x242/0xb40
[22099.673871][T22301]  do_group_exit+0x30/0x80
[22099.673873][T22301]  get_signal+0x951/0x980
[22099.673876][T22301]  ? futex_wake+0x84/0x170
[22099.673880][T22301]  arch_do_signal_or_restart+0x2d/0x1f0
[22099.673883][T22301]  ? do_futex+0x11a/0x1d0
[22099.673885][T22301]  ? __x64_sys_futex+0x67/0x1c0
[22099.673888][T22301]  exit_to_user_mode_loop+0x86/0x110
[22099.673890][T22301]  do_syscall_64+0x184/0x2b0
[22099.673892][T22301]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[22099.673895][T22301] RIP: 0033:0x7fafb0048af9
[22099.673896][T22301] Code: Unable to access opcode bytes at 
0x7fafb0048acf.
[22099.673898][T22301] RSP: 002b:00007fafaff50ea8 EFLAGS: 00000246 
ORIG_RAX: 00000000000000ca
[22099.673900][T22301] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 
00007fafb0048af9
[22099.673901][T22301] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 
0000559d33cab1a8
[22099.673903][T22301] RBP: 00007fafaff50ec0 R08: 0000000000000000 R09: 
0000000000000000
[22099.673904][T22301] R10: 0000000000000000 R11: 0000000000000246 R12: 
00007ffe78dbcd2e
[22099.673905][T22301] R13: 00007ffe78dbcd2f R14: 00007fafaff51700 R15: 
0000000000000000
[22099.673907][T22301]  </TASK>
[22099.673913][T22301] BUG: unable to handle page fault for address: 
ffffea7be1ffe548
[22099.674789][T22301] #PF: supervisor read access in kernel mode
[22099.674789][T22301] #PF: error_code(0x0000) - not-present page
[22099.674789][T22301] PGD 2bfff7067 P4D 2bfff7067 PUD 0
[22099.674789][T22301] Oops: Oops: 0000 [#1] SMP PTI
[22099.674789][T22301] CPU: 15 UID: 0 PID: 22301 Comm: repro Tainted: G 
   B   W           6.16.0-rc4-next-20250704+ #200 PREEMPT(voluntary)
[22099.674789][T22301] Tainted: [B]=BAD_PAGE, [W]=WARN
[22099.674789][T22301] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 1.12.0-1 04/01/2014
[22099.674789][T22301] RIP: 0010:unmap_page_range+0x1101/0x1d40
[22099.674789][T22301] Code: eb 03 cc cc cc f3 0f 1e fa f3 0f 1e fa e9 
ea 01 00 00 48 b8 ff ff ff ff ff 00 00 00 49 21 c2 49 c1 e2 06 4c 03 15 
ef a6 fd 00 <49> 8b 52 08 4c 89 d0 f6 c2 01 0f 8
[22099.674789][T22301] RSP: 0018:ffffc9000557baa0 EFLAGS: 00010282
[22099.674789][T22301] RAX: 00000003ffffffff RBX: 0000000020005000 RCX: 
f000000000000420
[22099.674789][T22301] RDX: 000000000000001e RSI: 0000000000000000 RDI: 
7803ff95ef87ff95
[22099.674789][T22301] RBP: f000d420f000d420 R08: ffff888000000028 R09: 
c000000100000864
[22099.674789][T22301] R10: ffffea7be1ffe540 R11: ffffc9000557b6b0 R12: 
0000000000000000
[22099.674789][T22301] R13: 00000000000001fb R14: ffffc9000557bcc0 R15: 
ffff888000000028
[22099.674789][T22301] FS:  00007fafaff51700(0000) 
GS:ffff8885b2b29000(0000) knlGS:0000000000000000
[22099.674789][T22301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22099.674789][T22301] CR2: ffffea7be1ffe548 CR3: 0000000103d8c000 CR4: 
00000000000006f0
[22099.674789][T22301] Call Trace:
[22099.674789][T22301]  <TASK>
[22099.674789][T22301]  ? flush_tlb_mm_range+0x21a/0x780
[22099.674789][T22301]  ? tlb_flush_mmu+0x30/0x1c0
[22099.674789][T22301]  unmap_vmas+0xab/0x160
[22099.674789][T22301]  exit_mmap+0xda/0x3c0
[22099.674789][T22301]  mmput+0x6e/0x130
[22099.674789][T22301]  do_exit+0x242/0xb40
[22099.674789][T22301]  do_group_exit+0x30/0x80
[22099.674789][T22301]  get_signal+0x951/0x980
[22099.674789][T22301]  ? futex_wake+0x84/0x170
[22099.674789][T22301]  arch_do_signal_or_restart+0x2d/0x1f0
[22099.674789][T22301]  ? do_futex+0x11a/0x1d0
[22099.674789][T22301]  ? __x64_sys_futex+0x67/0x1c0
[22099.674789][T22301]  exit_to_user_mode_loop+0x86/0x110
[22099.674789][T22301]  do_syscall_64+0x184/0x2b0
[22099.674789][T22301]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[22099.674789][T22301] RIP: 0033:0x7fafb0048af9
[22099.674789][T22301] Code: Unable to access opcode bytes at 
0x7fafb0048acf.
[22099.674789][T22301] RSP: 002b:00007fafaff50ea8 EFLAGS: 00000246 
ORIG_RAX: 00000000000000ca
[22099.674789][T22301] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 
00007fafb0048af9
[22099.674789][T22301] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 
0000559d33cab1a8
[22099.674789][T22301] RBP: 00007fafaff50ec0 R08: 0000000000000000 R09: 
0000000000000000
[22099.674789][T22301] R10: 0000000000000000 R11: 0000000000000246 R12: 
00007ffe78dbcd2e
[22099.674789][T22301] R13: 00007ffe78dbcd2f R14: 00007fafaff51700 R15: 
0000000000000000
[22099.674789][T22301]  </TASK>

> After disabling PT_RECLAIM in .config, the issue disappears.

Thanks for the test, I'll take a closer look.

> The reason it doesn't occur on arm64 is that x86 is the only platform
> that supports ARCH_SUPPORTS_PT_RECLAIM.
> 
>>
>>> [1] https://github.com/laifryiee/syzkaller_logs/tree/main/250803_193026___pte_offset_map_lock/kconfig_origin
>>>
>>> --
>>> Cheers,
>>>
>>> David / dhildenb
>>>
> 
> Thanks
> Barry


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ