lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0f9a6f66-7cbc-4c0d-b12e-9eaacdf1bda8@amd.com>
Date: Thu, 13 Feb 2025 11:09:37 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: linux-mm@...ck.org, linux-kernel@...r.kernel.org, gourry@...rry.net,
 nehagholkar@...a.com, abhishekd@...a.com, david@...hat.com,
 ying.huang@...el.com, nphamcs@...il.com, akpm@...ux-foundation.org,
 hannes@...xchg.org, feng.tang@...el.com, kbusch@...a.com, bharata@....com,
 Hasan.Maruf@....com, sj@...nel.org, willy@...radead.org,
 kirill.shutemov@...ux.intel.com, mgorman@...hsingularity.net,
 vbabka@...e.cz, hughd@...gle.com, rientjes@...gle.com, shy828301@...il.com,
 Liam.Howlett@...cle.com, peterz@...radead.org, mingo@...hat.com
Subject: Re: [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A
 bit



On 2/12/2025 10:32 PM, Davidlohr Bueso wrote:
> On Sun, 01 Dec 2024, Raghavendra K T wrote:
> 
>> 6. Holding PTE lock before migration.
> 
> fyi I tried testing this series with 'perf-bench numa mem' and got a 
> soft lockup,
> unable to take the PTL (and lost the machine to debug further atm), ie:
> 
> [ 3852.217675] CPU: 127 UID: 0 PID: 12537 Comm: watch-numa-sche Tainted: 
> G      D      L     6.14.0-rc2-kmmscand-v1+ #3
> [ 3852.217677] Tainted: [D]=DIE, [L]=SOFTLOCKUP
> [ 3852.217678] RIP: 0010:native_queued_spin_lock_slowpath+0x64/0x290
> [ 3852.217683] Code: 77 7b f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 
> 08 30 e4 09 d0 3d ff 00 00 00 77 57 85 c0 74 10 0f b6 03 84 c0 74 09 f3 
> 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 5d c3
> [ 3852.217684] RSP: 0018:ff274259b3c9f988 EFLAGS: 00000202
> [ 3852.217685] RAX: 0000000000000001 RBX: ffbd2efd8c08c9a8 RCX: 
> 000ffffffffff000
> [ 3852.217686] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 
> ffbd2efd8c08c9a8
> [ 3852.217687] RBP: ff161328422c1328 R08: ff274259b3c9fb90 R09: 
> ff161328422c1000
> [ 3852.217688] R10: 00000000ffffffff R11: 0000000000000004 R12: 
> 00007f52cca00000
> [ 3852.217688] R13: ff274259b3c9fa00 R14: ff16132842326000 R15: 
> ff161328422c1328
> [ 3852.217689] FS:  00007f32b6f92b80(0000) GS:ff161423bfd80000(0000) 
> knlGS:0000000000000000
> [ 3852.217691] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3852.217692] CR2: 0000564ddbf68008 CR3: 00000080a81cc005 CR4: 
> 0000000000773ef0
> [ 3852.217693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [ 3852.217694] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
> 0000000000000400
> [ 3852.217694] PKRU: 55555554
> [ 3852.217695] Call Trace:
> [ 3852.217696]  <IRQ>
> [ 3852.217697]  ? watchdog_timer_fn+0x21b/0x2a0
> [ 3852.217699]  ? __pfx_watchdog_timer_fn+0x10/0x10
> [ 3852.217702]  ? __hrtimer_run_queues+0x10f/0x2a0
> [ 3852.217704]  ? hrtimer_interrupt+0xfb/0x240
> [ 3852.217706]  ? __sysvec_apic_timer_interrupt+0x4e/0x110
> [ 3852.217709]  ? sysvec_apic_timer_interrupt+0x68/0x90
> [ 3852.217712]  </IRQ>
> [ 3852.217712]  <TASK>
> [ 3852.217713]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 3852.217717]  ? native_queued_spin_lock_slowpath+0x64/0x290
> [ 3852.217720]  _raw_spin_lock+0x25/0x30
> [ 3852.217723]  __pte_offset_map_lock+0x9a/0x110
> [ 3852.217726]  gather_pte_stats+0x1e3/0x2c0
> [ 3852.217730]  walk_pgd_range+0x528/0xbb0
> [ 3852.217733]  __walk_page_range+0x71/0x1d0
> [ 3852.217736]  walk_page_vma+0x98/0xf0
> [ 3852.217738]  show_numa_map+0x11a/0x3a0
> [ 3852.217741]  seq_read_iter+0x2a6/0x470
> [ 3852.217745]  seq_read+0x12b/0x170
> [ 3852.217748]  vfs_read+0xe0/0x370
> [ 3852.217751]  ? syscall_exit_to_user_mode+0x49/0x210
> [ 3852.217755]  ? do_syscall_64+0x8a/0x190
> [ 3852.217758]  ksys_read+0x6a/0xe0
> [ 3852.217762]  do_syscall_64+0x7e/0x190
> [ 3852.217765]  ? __memcg_slab_free_hook+0xd4/0x120
> [ 3852.217768]  ? __x64_sys_close+0x38/0x80
> [ 3852.217771]  ? kmem_cache_free+0x3bf/0x3e0
> [ 3852.217774]  ? syscall_exit_to_user_mode+0x49/0x210
> [ 3852.217777]  ? do_syscall_64+0x8a/0x190
> [ 3852.217780]  ? do_syscall_64+0x8a/0x190
> [ 3852.217783]  ? __irq_exit_rcu+0x3e/0xe0
> [ 3852.217785]  entry_SYSCALL_64_after_hwframe+0x76/0x7e


Hello David,

Thanks for reporting, details. Reproducer information helps me
to stabilize the code quickly. Micro-benchmark I used did not show any
issues. I will add PTL lock and also check the issue from my side..

(with multiple scanning threads, it could cause even more issues because
of more migration pressure, wondering if I should go ahead with more
stabilized single thread scanning version in the coming post)

Thanks and Regards
- Raghu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ