[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <554f4cdf-e4d9-f547-d3bb-1bcc1c9eb1@google.com>
Date: Sat, 21 May 2022 19:49:10 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Mel Gorman <mgorman@...hsingularity.net>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Nicolas Saenz Julienne <nsaenzju@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>,
Michal Hocko <mhocko@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH 5/6] mm/page_alloc: Protect PCP lists with a spinlock
On Mon, 9 May 2022, Mel Gorman wrote:
> Currently the PCP lists are protected by using local_lock_irqsave to
> prevent migration and IRQ reentrancy but this is inconvenient. Remote
> draining of the lists is impossible and a workqueue is required and
> every task allocation/free must disable then enable interrupts which is
> expensive.
>
> As preparation for dealing with both of those problems, protect the
> lists with a spinlock. The IRQ-unsafe version of the lock is used
> because IRQs are already disabled by local_lock_irqsave. spin_trylock
> is used in preparation for a time when local_lock could be used instead
> of lock_lock_irqsave.
8c580f60a145 ("mm/page_alloc: protect PCP lists with a spinlock")
in next-20220520: I haven't looked up whether that comes from a
stable or unstable suburb of akpm's tree.
Mel, the VM_BUG_ON(in_hardirq()) which this adds to free_unref_page_list()
is not valid. I have no appreciation of how important it is to the whole
scheme, but as it stands, it crashes; and when I change it to a warning
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3475,7 +3475,7 @@ void free_unref_page_list(struct list_he
if (list_empty(list))
return;
- VM_BUG_ON(in_hardirq());
+ WARN_ON_ONCE(in_hardirq());
local_lock_irqsave(&pagesets.lock, flags);
then everything *appears* to go on working correctly after the splat
below (from which you will infer that I'm swapping to nvme):
[ 256.167040] WARNING: CPU: 0 PID: 9842 at mm/page_alloc.c:3478 free_unref_page_list+0x92/0x343
[ 256.170031] CPU: 0 PID: 9842 Comm: cc1 Not tainted 5.18.0-rc7-n20 #3
[ 256.171285] Hardware name: LENOVO 20HQS0EG02/20HQS0EG02, BIOS N1MET54W (1.39 ) 04/16/2019
[ 256.172555] RIP: 0010:free_unref_page_list+0x92/0x343
[ 256.173820] Code: ff ff 49 8b 44 24 08 4d 89 e0 4c 8d 60 f8 eb b6 48 8b 03 48 39 c3 0f 84 af 02 00 00 65 8b 05 72 7f df 7e a9 00 00 0f 00 74 02 <0f> 0b 9c 41 5d fa 41 0f ba e5 09 73 05 e8 1f 0a f9 ff e8 46 90 7b
[ 256.175289] RSP: 0018:ffff88803ec07c80 EFLAGS: 00010006
[ 256.176683] RAX: 0000000080010000 RBX: ffff88803ec07cf8 RCX: 000000000000002c
[ 256.178122] RDX: 0000000000000000 RSI: ffff88803ec29d28 RDI: 0000000000000040
[ 256.179580] RBP: ffff88803ec07cc0 R08: ffff88803ec07cf0 R09: 00000000000a401d
[ 256.181031] R10: 0000000000000000 R11: ffff8880101891b8 R12: ffff88803f6dd600
[ 256.182501] R13: ffff88803ec07cf8 R14: 000000000000000f R15: 0000000000000000
[ 256.183957] FS: 00007ffff7fcfac0(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[ 256.185419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 256.186911] CR2: 0000555555710cdc CR3: 00000000240b4004 CR4: 00000000003706f0
[ 256.188395] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 256.189888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 256.191390] Call Trace:
[ 256.192844] <IRQ>
[ 256.194253] ? __mem_cgroup_uncharge_list+0x4e/0x57
[ 256.195715] release_pages+0x26f/0x27e
[ 256.197150] ? list_add_tail+0x39/0x39
[ 256.198603] pagevec_lru_move_fn+0x95/0xa4
[ 256.200065] folio_rotate_reclaimable+0xa0/0xd1
[ 256.201545] folio_end_writeback+0x1c/0x78
[ 256.203064] end_page_writeback+0x11/0x13
[ 256.204495] end_swap_bio_write+0x87/0x95
[ 256.205967] bio_endio+0x15e/0x162
[ 256.207393] blk_mq_end_request_batch+0xd2/0x18d
[ 256.208932] ? __this_cpu_preempt_check+0x13/0x15
[ 256.210378] ? lock_is_held_type+0xcf/0x10f
[ 256.211714] ? lock_is_held+0xc/0xe
[ 256.213065] ? rcu_read_lock_sched_held+0x24/0x4f
[ 256.214450] nvme_pci_complete_batch+0x4c/0x51
[ 256.215811] nvme_irq+0x43/0x4e
[ 256.217173] ? nvme_unmap_data+0xb5/0xb5
[ 256.218633] __handle_irq_event_percpu+0xff/0x235
[ 256.220062] handle_irq_event_percpu+0x10/0x39
[ 256.221584] handle_irq_event+0x34/0x53
[ 256.223061] handle_edge_irq+0xb1/0xd5
[ 256.224486] __common_interrupt+0x7a/0xe6
[ 256.225918] common_interrupt+0x9c/0xca
[ 256.227330] </IRQ>
[ 256.228763] <TASK>
[ 256.230226] asm_common_interrupt+0x2c/0x40
[ 256.231724] RIP: 0010:lock_acquire.part.0+0x1a9/0x1b4
[ 256.233190] Code: df ec 7e ff c8 74 19 0f 0b 48 c7 c7 77 2c 4e 82 e8 d7 e7 88 00 65 c7 05 01 df ec 7e 00 00 00 00 48 85 db 74 01 fb 48 8d 65 d8 <5b> 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 41 57 4d 89 cf 41 56
[ 256.234804] RSP: 0018:ffff888010e939f0 EFLAGS: 00000206
[ 256.236339] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 3e4406066e4f4abc
[ 256.237889] RDX: 0000000000000000 RSI: ffffffff824021e7 RDI: ffffffff8244b3d1
[ 256.239401] RBP: ffff888010e93a18 R08: 0000000000000028 R09: 000000000002001d
[ 256.240994] R10: 0000000000000000 R11: ffff8880101891b8 R12: 0000000000000002
[ 256.242545] R13: ffffffff82757a80 R14: ffffffff81253954 R15: 0000000000000000
[ 256.244085] ? __folio_memcg_unlock+0x48/0x48
[ 256.245597] lock_acquire+0xfa/0x10a
[ 256.247179] ? __folio_memcg_unlock+0x48/0x48
[ 256.248727] rcu_lock_acquire.constprop.0+0x24/0x27
[ 256.250293] ? __folio_memcg_unlock+0x48/0x48
[ 256.251736] mem_cgroup_iter+0x3d/0x178
[ 256.253245] shrink_node_memcgs+0x169/0x182
[ 256.254822] shrink_node+0x220/0x3d9
[ 256.256372] shrink_zones+0x10f/0x1ca
[ 256.257923] ? __this_cpu_preempt_check+0x13/0x15
[ 256.259437] do_try_to_free_pages+0x7a/0x192
[ 256.260947] try_to_free_mem_cgroup_pages+0x14b/0x213
[ 256.262405] try_charge_memcg+0x230/0x433
[ 256.263865] try_charge+0x12/0x17
[ 256.265236] charge_memcg+0x25/0x7c
[ 256.266615] __mem_cgroup_charge+0x28/0x3d
[ 256.267962] mem_cgroup_charge.constprop.0+0x1d/0x1f
[ 256.269290] do_anonymous_page+0x118/0x20c
[ 256.270712] handle_pte_fault+0x151/0x15f
[ 256.272015] __handle_mm_fault+0x39d/0x3ac
[ 256.273198] handle_mm_fault+0xc2/0x188
[ 256.274371] do_user_addr_fault+0x240/0x39d
[ 256.275595] exc_page_fault+0x1e1/0x204
[ 256.276787] asm_exc_page_fault+0x2c/0x40
[ 256.277906] RIP: 0033:0xd67250
[ 256.279017] Code: 02 f6 ff 49 89 c0 48 83 fb 08 73 1e f6 c3 04 75 39 48 85 db 74 2d c6 00 00 f6 c3 02 74 25 31 c0 66 41 89 44 18 fe eb 1b 66 90 <48> c7 44 18 f8 00 00 00 00 48 8d 4b ff 31 c0 4c 89 c7 48 c1 e9 03
[ 256.280355] RSP: 002b:00007fffffffc360 EFLAGS: 00010206
[ 256.281678] RAX: 00007ffff5972000 RBX: 00000000000000a8 RCX: 000000000000000c
[ 256.282955] RDX: 0000000000000006 RSI: 0000000000000017 RDI: 0000000000000986
[ 256.284264] RBP: 0000000000000002 R08: 00007ffff5972000 R09: 0000000000000987
[ 256.285554] R10: 0000000000000001 R11: 0000000001000001 R12: 00007ffff5969c80
[ 256.286930] R13: 0000000000000015 R14: 0000000000000000 R15: 0000000001eb4760
[ 256.288300] </TASK>
[ 256.289533] irq event stamp: 95044
[ 256.290776] hardirqs last enabled at (95043): [<ffffffff819ddf43>] irqentry_exit+0x67/0x75
[ 256.292147] hardirqs last disabled at (95044): [<ffffffff819db09c>] common_interrupt+0x1a/0xca
[ 256.293479] softirqs last enabled at (94982): [<ffffffff81c0036f>] __do_softirq+0x36f/0x3aa
[ 256.294754] softirqs last disabled at (94977): [<ffffffff81105c01>] __irq_exit_rcu+0x85/0xc1
Powered by blists - more mailing lists