[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yo+2qqHqSdpE5l7m@qian>
Date: Thu, 26 May 2022 13:19:38 -0400
From: Qian Cai <quic_qiancai@...cinc.com>
To: Mel Gorman <mgorman@...hsingularity.net>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Nicolas Saenz Julienne <nsaenzju@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>,
Michal Hocko <mhocko@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH 0/6] Drain remote per-cpu directly v3
On Thu, May 12, 2022 at 09:50:37AM +0100, Mel Gorman wrote:
> Changelog since v2
> o More conversions from page->lru to page->[pcp_list|buddy_list]
> o Additional test results in changelogs
>
> Changelog since v1
> o Fix unsafe RT locking scheme
> o Use spin_trylock on UP PREEMPT_RT
>
> This series has the same intent as Nicolas' series "mm/page_alloc: Remote
> per-cpu lists drain support" -- avoid interference of a high priority
> task due to a workqueue item draining per-cpu page lists. While many
> workloads can tolerate a brief interruption, it may be cause a real-time
> task runnning on a NOHZ_FULL CPU to miss a deadline and at minimum,
> the draining in non-deterministic.
>
> Currently an IRQ-safe local_lock protects the page allocator per-cpu lists.
> The local_lock on its own prevents migration and the IRQ disabling protects
> from corruption due to an interrupt arriving while a page allocation is
> in progress. The locking is inherently unsafe for remote access unless
> the CPU is hot-removed.
>
> This series adjusts the locking. A spinlock is added to struct
> per_cpu_pages to protect the list contents while local_lock_irq continues
> to prevent migration and IRQ reentry. This allows a remote CPU to safely
> drain a remote per-cpu list.
>
> This series is a partial series. Follow-on work should allow the
> local_irq_save to be converted to a local_irq to avoid IRQs being
> disabled/enabled in most cases. Consequently, there are some TODO comments
> highlighting the places that would change if local_irq was used. However,
> there are enough corner cases that it deserves a series on its own
> separated by one kernel release and the priority right now is to avoid
> interference of high priority tasks.
>
> Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages
> and when it is storing per-cpu pages.
>
> Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking
> this is not necessary but it avoids per_cpu_pages consuming another
> cache line.
>
> Patch 3 is a preparation patch to avoid code duplication.
>
> Patch 4 is a simple micro-optimisation that improves code flow necessary for
> a later patch to avoid code duplication.
>
> Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still
> relying on local_lock to prevent migration, stabilise the pcp
> lookup and prevent IRQ reentrancy.
>
> Patch 6 remote drains per-cpu pages directly instead of using a workqueue.
Mel, we saw spontanous "mm_percpu_wq" crash on today's linux-next tree
while running CPU offlining/onlining, and wondering if you have any
thoughts?
WARNING: CPU: 31 PID: 173 at kernel/kthread.c:524 __kthread_bind_mask
CPU: 31 PID: 173 Comm: kworker/31:0 Not tainted 5.18.0-next-20220526-dirty #127
Workqueue: 0x0
(mm_percpu_wq)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __kthread_bind_mask
lr : __kthread_bind_mask
sp : ffff800018667c50
x29: ffff800018667c50
x28: ffff800018667d20
x27: ffff083678bc2458
x26: 1fffe1002f5b17a8
x25: ffff08017ad8bd40
x24: 1fffe106cf17848b
x23: 1ffff000030ccfa0 x22: ffff0801de2d1ac0
x21: ffff0801de2d1ac0
x20: ffff07ff80286f08 x19: ffff0801de2d1ac0
x18: ffffd6056a577d1c
x17: ffffffffffffffff
x16: 1fffe0fff158eb18
x15: 1fffe106cf176138
x14: 000000000000f1f1
x13: 00000000f3f3f3f3 x12: ffff7000030ccf3b
x11: 1ffff000030ccf3a x10: ffff7000030ccf3a x9 : dfff800000000000
x8 : ffff8000186679d7
x7 : 0000000000000001
x6 : ffff7000030ccf3a
x5 : 1ffff000030ccf39 x4 : 1ffff000030ccf4e x3 : 0000000000000000
x2 : 0000000000000000
x1 : ffff07ff8ac74fc0 x0 : 0000000000000000
Call trace:
__kthread_bind_mask
kthread_bind_mask
create_worker
worker_thread
kthread
ret_from_fork
irq event stamp: 146
hardirqs last enabled at (145): _raw_spin_unlock_irqrestore
hardirqs last disabled at (146): el1_dbg
softirqs last enabled at (0): copy_process
softirqs last disabled at (0): 0x0
WARNING: CPU: 31 PID: 173 at kernel/kthread.c:593 kthread_set_per_cpu
CPU: 31 PID: 173 Comm: kworker/31:0 Tainted: G W 5.18.0-next-20220526-dirty #127
Workqueue: 0x0 (mm_percpu_wq)
pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kthread_set_per_cpu
lr : worker_attach_to_pool
sp : ffff800018667be0
x29: ffff800018667be0 x28: ffff800018667d20 x27: ffff083678bc2458
x26: 1fffe1002f5b17a8 x25: ffff08017ad8bd40 x24: 1fffe106cf17848b
x23: 1fffe1003bc5a35d x22: ffff0801de2d1aec x21: 0000000000000007
x20: ffff4026d8adae00 x19: ffff0801de2d1ac0 x18: ffffd6056a577d1c
x17: ffffffffffffffff x16: 1fffe0fff158eb18 x15: 1fffe106cf176138
x14: 000000000000f1f1 x13: 00000000f3f3f3f3 x12: ffff7000030ccf53
x11: 1ffff000030ccf52 x10: ffff7000030ccf52 x9 : ffffd60563f9a038
x8 : ffff800018667a97 x7 : 0000000000000001 x6 : ffff7000030ccf52
x5 : ffff800018667a90 x4 : ffff7000030ccf53 x3 : 1fffe1003bc5a408
x2 : 0000000000000000 x1 : 000000000000001f x0 : 0000000000208060
Call trace:
kthread_set_per_cpu
worker_attach_to_pool at kernel/workqueue.c:1873
create_worker
worker_thread
kthread
ret_from_fork
irq event stamp: 146
hardirqs last enabled at (145): _raw_spin_unlock_irqrestore
hardirqs last disabled at (146): el1_dbg
softirqs last enabled at (0): copy_process
softirqs last disabled at (0): 0x0
Unable to handle kernel paging request at virtual address dfff800000000003
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
[dfff800000000003] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
CPU: 83 PID: 23994 Comm: kworker/31:2 Not tainted 5.18.0-next-20220526-dirty #127
pstate: 104000c9 (nzcV daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __lock_acquire
lr : lock_acquire.part.0
sp : ffff800071777ac0
x29: ffff800071777ac0 x28: ffffd60563fa6380
x27: 0000000000000018
x26: 0000000000000080 x25: 0000000000000018 x24: 0000000000000000
x23: ffff0801de2d1ac0 x22: ffffd6056a66a7e0
x21: 0000000000000000
x20: 0000000000000000 x19: 0000000000000000
x18: 0000000000000767
x17: 0000000000000000 x16: 1fffe1003bc5a473
x15: 1fffe806c88e9338
x14: 000000000000f1f1
x13: 00000000f3f3f3f3
x12: ffff0801de2d1ac8
x11: 1ffffac0ad4aefa3
x10: ffffd6056a577d18 x9 : 0000000000000000
x8 : 0000000000000003
x7 : ffffd60563fa6380
x6 : 0000000000000000
x5 : 0000000000000080
x4 : 0000000000000001
x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000003
x0 : dfff800000000000
Call trace:
__lock_acquire at kernel/locking/lockdep.c:4923
lock_acquire
_raw_spin_lock_irq
worker_thread at kernel/workqueue.c:2389
kthread
ret_from_fork
Code: d65f03c0 d343ff61 d2d00000 f2fbffe0 (38e06820)
---[ end trace 0000000000000000 ]---
1424.464630][T23994] Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: 0x56055bdf0000 from 0xffff800008000000
PHYS_OFFSET: 0x80000000
CPU features: 0x000,0042e015,19801c82
Memory Limit: none
Powered by blists - more mailing lists