lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 5 Jun 2024 18:17:09 +0200
From: David Hildenbrand <david@...hat.com>
To: Yang Shi <yang@...amperecomputing.com>, Peter Xu <peterx@...hat.com>
Cc: oliver.sang@...el.com, paulmck@...nel.org, willy@...radead.org,
 riel@...riel.com, vivek.kasireddy@...el.com, cl@...ux.com,
 akpm@...ux-foundation.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: page_ref: remove folio_try_get_rcu()

On 05.06.24 18:16, Yang Shi wrote:
> 
> 
> On 6/5/24 8:25 AM, Peter Xu wrote:
>> On Tue, Jun 04, 2024 at 04:48:57PM -0700, Yang Shi wrote:
>>> The below bug was reported on a non-SMP kernel:
>>>
>>> [  275.267158][ T4335] ------------[ cut here ]------------
>>> [  275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275!
>>> [  275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI
>>> [  275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1
>>> [  275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>> [  275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
>>> [  275.272813][ T4335] RSP: 0018:ffffc90005dcf650 EFLAGS: 00010202
>>> [  275.273346][ T4335] RAX: 0000000000000246 RBX: ffffea00066e0000 RCX: 0000000000000000
>>> [  275.274032][ T4335] RDX: fffff94000cdc007 RSI: 0000000000000004 RDI: ffffea00066e0034
>>> [  275.274719][ T4335] RBP: ffffea00066e0000 R08: 0000000000000000 R09: fffff94000cdc006
>>> [  275.275404][ T4335] R10: ffffea00066e0037 R11: 0000000000000000 R12: 0000000000000136
>>> [  275.276106][ T4335] R13: ffffea00066e0034 R14: dffffc0000000000 R15: ffffea00066e0008
>>> [  275.276790][ T4335] FS:  00007fa2f9b61740(0000) GS:ffffffff89d0d000(0000) knlGS:0000000000000000
>>> [  275.277570][ T4335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  275.278143][ T4335] CR2: 00007fa2f6c00000 CR3: 0000000134b04000 CR4: 00000000000406f0
>>> [  275.278833][ T4335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [  275.279521][ T4335] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [  275.280201][ T4335] Call Trace:
>>> [  275.280499][ T4335]  <TASK>
>>> [ 275.280751][ T4335] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447)
>>> [ 275.281087][ T4335] ? do_trap (arch/x86/kernel/traps.c:112 arch/x86/kernel/traps.c:153)
>>> [ 275.281463][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
>>> [ 275.281884][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
>>> [ 275.282300][ T4335] ? do_error_trap (arch/x86/kernel/traps.c:174)
>>> [ 275.282711][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
>>> [ 275.283129][ T4335] ? handle_invalid_op (arch/x86/kernel/traps.c:212)
>>> [ 275.283561][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
>>> [ 275.283990][ T4335] ? exc_invalid_op (arch/x86/kernel/traps.c:264)
>>> [ 275.284415][ T4335] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
>>> [ 275.284859][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
>>> [ 275.285278][ T4335] try_grab_folio (mm/gup.c:148)
>>> [ 275.285684][ T4335] __get_user_pages (mm/gup.c:1297 (discriminator 1))
>>> [ 275.286111][ T4335] ? __pfx___get_user_pages (mm/gup.c:1188)
>>> [ 275.286579][ T4335] ? __pfx_validate_chain (kernel/locking/lockdep.c:3825)
>>> [ 275.287034][ T4335] ? mark_lock (kernel/locking/lockdep.c:4656 (discriminator 1))
>>> [ 275.287416][ T4335] __gup_longterm_locked (mm/gup.c:1509 mm/gup.c:2209)
>>> [ 275.288192][ T4335] ? __pfx___gup_longterm_locked (mm/gup.c:2204)
>>> [ 275.288697][ T4335] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
>>> [ 275.289135][ T4335] ? __pfx___might_resched (kernel/sched/core.c:10106)
>>> [ 275.289595][ T4335] pin_user_pages_remote (mm/gup.c:3350)
>>> [ 275.290041][ T4335] ? __pfx_pin_user_pages_remote (mm/gup.c:3350)
>>> [ 275.290545][ T4335] ? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1))
>>> [ 275.290961][ T4335] ? mm_access (kernel/fork.c:1573)
>>> [ 275.291353][ T4335] process_vm_rw_single_vec+0x142/0x360
>>> [ 275.291900][ T4335] ? __pfx_process_vm_rw_single_vec+0x10/0x10
>>> [ 275.292471][ T4335] ? mm_access (kernel/fork.c:1573)
>>> [ 275.292859][ T4335] process_vm_rw_core+0x272/0x4e0
>>> [ 275.293384][ T4335] ? hlock_class (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228)
>>> [ 275.293780][ T4335] ? __pfx_process_vm_rw_core+0x10/0x10
>>> [ 275.294350][ T4335] process_vm_rw (mm/process_vm_access.c:284)
>>> [ 275.294748][ T4335] ? __pfx_process_vm_rw (mm/process_vm_access.c:259)
>>> [ 275.295197][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
>>> [ 275.295634][ T4335] __x64_sys_process_vm_readv (mm/process_vm_access.c:291)
>>> [ 275.296139][ T4335] ? syscall_enter_from_user_mode (kernel/entry/common.c:94 kernel/entry/common.c:112)
>>> [ 275.296642][ T4335] do_syscall_64 (arch/x86/entry/common.c:51 (discriminator 1) arch/x86/entry/common.c:82 (discriminator 1))
>>> [ 275.297032][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
>>> [ 275.297470][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
>>> [ 275.297988][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
>>> [ 275.298389][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
>>> [ 275.298906][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
>>> [ 275.299304][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
>>> [ 275.299703][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
>>> [ 275.300115][ T4335] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>>>
>>> This BUG is the VM_BUG_ON(!in_atomic() && !irqs_disabled()) assertion in
>>> folio_ref_try_add_rcu() for non-SMP kernel.
>>>
>>> The process_vm_readv() calls GUP to pin the THP. An optimization for
>>> pinning THP instroduced by commit 57edfcfd3419 ("mm/gup: accelerate thp
>>> gup even for "pages != NULL"") calls try_grab_folio() to pin the THP,
>>> but try_grab_folio() is supposed to be called in atomic context for
>>> non-SMP kernel, for example, irq disabled or preemption disabled, due to
>>> the optimization introduced by commit e286781d5f2e ("mm: speculative
>>> page references").
>>>
>>> The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
>>> boundaries") is not actually the root cause although it was bisected to.
>>> It just makes the problem exposed more likely.
>>>
>>> The follow up discussion suggested the optimization for non-SMP kernel
>>> may be out-dated and not worth it anymore [1].  So removing the
>>> optimization to silence the BUG.
>>>
>>> However calling try_grab_folio() in GUP slow path actually is
>>> unnecessary, so the following patch will clean this up.
>>>
>>> [1] https://lore.kernel.org/linux-mm/821cf1d6-92b9-4ac4-bacc-d8f2364ac14f@paulmck-laptop/
>>> Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"")
>>> Reported-by: kernel test robot <oliver.sang@...el.com>
>>> Cc: linux-stable <stable@...r.kernel.org> v6.6+
>>> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
>> Just to mention, IMHO it'll still be nicer if we keep the 1st fix patch
>> only have the folio_ref_try_add_rcu() changes, it'll be easier for
>> backport.
>>
>> Now this patch contains not only that but also logically a cleanup patch
>> that replaces old rcu calls to folio_try_get().  But squashing these may
>> mean we need explicit backport to 6.6 depending on whether those lines
>> changed, meanwhile the cleanup part may not be justfied to be backported in
>> the first place.  I'll leave that to you to decide, no strong feelings here.
> 
> Neither do I. But I slightly prefer have the patch as is for mainline
> since removing the #ifdef and the clean up lead by it seems
> self-contained and naturally integral. If it can not be applied to
> stable tree without conflict, I can generate a separate patch for stable
> tree with the removing #ifdef part. The effort should be trivial.

Agreed

Acked-by: David Hildenbrand <david@...hat.com>

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ