[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEBs3R8C910eiaXcSMPPrtbMjFLNYzYdPGJG+gw4WHM8A@mail.gmail.com>
Date: Wed, 20 Nov 2024 08:56:21 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Max Kellermann <max.kellermann@...os.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, Christoph Hellwig <hch@....de>
Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter
On Wed, Nov 20, 2024 at 8:30 AM Max Kellermann <max.kellermann@...os.com> wrote:
>
> On Tue, Aug 6, 2024 at 5:56 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> > Hmm. The original scenario I was thinking about when I proposed this
> > WARN_ON() was deemed impossible, so I think the only other possibility
> > is that the task being killed somehow skipped psi_memstall_leave()
> > before its death... Did you have the instrumentation I suggested to
> > track imbalance between psi_memstall_enter()/psi_memstall_leave() and
> > to record the _RET_IP_? If so, did it trigger at all?
>
> Hi Suren,
>
> it has taken a few months, but today, this bug was trapped by 6.11.9
> with your patch:
>
> ------------[ cut here ]------------
> Stall from readahead_expand+0xcf/0x1d0 was never cleared
This is interesting. readahead_expand() indeed has
psi_memstall_enter() without a matching psi_memstall_leave():
https://elixir.bootlin.com/linux/v6.11.9/source/mm/readahead.c#L727
and https://elixir.bootlin.com/linux/v6.11.9/source/mm/readahead.c#L754.
Looks like this was introduced in [1]. I'm not sure if that's a bug
or psi_memstall_leave() is supposed to be called from somewhere else.
CC'ing Christoph to clarify.
[1] 176042404ee6 "mm: add PSI accounting around ->read_folio and
->readahead calls"
> WARNING: CPU: 2 PID: 22644 at kernel/exit.c:933 do_exit+0x8c7/0x970
> Modules linked in:
> CPU: 2 UID: 2148030876 PID: 22644 Comm: php-was Not tainted
> 6.11.9-cm4all2-vm+ #242
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> RIP: 0010:do_exit+0x8c7/0x970
> Code: 75 bb 4c 89 ef 48 89 dd e8 b6 21 0d 00 48 c7 c7 80 60 60 a5 e8
> ca dd ba 00 e9 5b fa ff ff 48 c7 c7 d8 4d fc a4 e8 e9 9e ff ff <0f> 0b
> e9 1d fe ff ff e8 4d e6 24 00 e9 9b fd ff ff 48 8b 78 68 e8
> RSP: 0018:ffffc2bdd2fafc50 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff9f96762bab80 RCX: 0000000000000027
> RDX: ffff9fb1bee9b188 RSI: 0000000000000001 RDI: ffff9fb1bee9b180
> RBP: ffffc2bdd2faf6a8 R08: 00000000ffffbfff R09: 0000000000000001
> R10: 0000000000000003 R11: ffff9fb1beca0000 R12: ffffc2bdd2fafc60
> R13: ffff9faa09fed5d8 R14: 0000000000000001 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff9fb1bee80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f0bfd7b0000 CR3: 000000027762e003 CR4: 00000000003706b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __warn+0x7c/0x110
> ? do_exit+0x8c7/0x970
> ? report_bug+0x14c/0x170
> ? handle_bug+0x53/0x90
> ? exc_invalid_op+0x13/0x60
> ? asm_exc_invalid_op+0x16/0x20
> ? do_exit+0x8c7/0x970
> do_group_exit+0x2c/0x80
> __x64_sys_exit_group+0x14/0x20
> x64_sys_call+0x14aa/0x16a0
> do_syscall_64+0x64/0x100
> ? __memcg_kmem_charge_page+0xdc/0x1d0
> ? do_fault+0x270/0x490
> ? __handle_mm_fault+0x321/0x1280
> ? __count_memcg_events+0x53/0xf0
> ? handle_mm_fault+0xa0/0x260
> ? do_user_addr_fault+0x33b/0x5b0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f0c00c5365a
> Code: Unable to access opcode bytes at 0x7f0c00c53630.
> RSP: 002b:00007ffc678fba38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c00c5365a
> RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 00007ffc678fba88 R08: 00007ffc678fb938 R09: 0000000000000018
> R10: 0000000000000008 R11: 0000000000000246 R12: 00007f0c004db8a0
> R13: 00007ffc678fbab0 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> Stall from readahead_expand+0xcf/0x1d0 was never cleared
> WARNING: CPU: 2 PID: 22644 at kernel/sched/psi.c:989
> psi_task_switch+0x126/0x230
> Modules linked in:
> CPU: 2 UID: 2148030876 PID: 22644 Comm: php-was Tainted: G W
> 6.11.9-cm4all2-vm+ #242
> Tainted: [W]=WARN
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> RIP: 0010:psi_task_switch+0x126/0x230
> Code: f6 75 e6 41 f6 44 24 18 80 74 36 41 f6 84 24 a8 08 00 00 02 74
> 2b 49 8b b4 24 b0 08 00 00 48 c7 c7 d8 4d fc a4 e8 5a de f9 ff <0f> 0b
> 41 f6 44 24 18 80 74 0d 41 f6 84 24 a8 08 00 00 02 74 02 0f
> RSP: 0018:ffffc2bdd2fafb48 EFLAGS: 00010086
> RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000027
> RDX: ffff9fb1bee9b188 RSI: 0000000000000001 RDI: ffff9fb1bee9b180
> RBP: 000000000000001c R08: 00000000ffffbfff R09: 0000000000000001
> R10: 0000000000000003 R11: ffff9fb1beca0000 R12: ffff9f96762bab80
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff9fb1bee80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f0bfd7b0000 CR3: 000000027762e003 CR4: 00000000003706b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __warn+0x7c/0x110
> ? psi_task_switch+0x126/0x230
> ? report_bug+0x14c/0x170
> ? handle_bug+0x53/0x90
> ? exc_invalid_op+0x13/0x60
> ? asm_exc_invalid_op+0x16/0x20
> ? psi_task_switch+0x126/0x230
> ? psi_task_switch+0x126/0x230
> __schedule+0x954/0x1150
> do_task_dead+0x3e/0x40
> do_exit+0x6f0/0x970
> do_group_exit+0x2c/0x80
> __x64_sys_exit_group+0x14/0x20
> x64_sys_call+0x14aa/0x16a0
> do_syscall_64+0x64/0x100
> ? __memcg_kmem_charge_page+0xdc/0x1d0
> ? do_fault+0x270/0x490
> ? __handle_mm_fault+0x321/0x1280
> ? __count_memcg_events+0x53/0xf0
> ? handle_mm_fault+0xa0/0x260
> ? do_user_addr_fault+0x33b/0x5b0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f0c00c5365a
> Code: Unable to access opcode bytes at 0x7f0c00c53630.
> RSP: 002b:00007ffc678fba38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c00c5365a
> RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 00007ffc678fba88 R08: 00007ffc678fb938 R09: 0000000000000018
> R10: 0000000000000008 R11: 0000000000000246 R12: 00007f0c004db8a0
> R13: 00007ffc678fbab0 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 22644 at kernel/sched/psi.c:994
> psi_task_switch+0x13b/0x230
> Modules linked in:
> CPU: 2 UID: 2148030876 PID: 22644 Comm: php-was Tainted: G W
> 6.11.9-cm4all2-vm+ #242
> Tainted: [W]=WARN
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> RIP: 0010:psi_task_switch+0x13b/0x230
> Code: 2b 49 8b b4 24 b0 08 00 00 48 c7 c7 d8 4d fc a4 e8 5a de f9 ff
> 0f 0b 41 f6 44 24 18 80 74 0d 41 f6 84 24 a8 08 00 00 02 74 02 <0f> 0b
> 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 41
> RSP: 0018:ffffc2bdd2fafb48 EFLAGS: 00010002
> RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000027
> RDX: ffff9fb1bee9b188 RSI: 0000000000000001 RDI: ffff9fb1bee9b180
> RBP: 000000000000001c R08: 00000000ffffbfff R09: 0000000000000001
> R10: 0000000000000003 R11: ffff9fb1beca0000 R12: ffff9f96762bab80
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff9fb1bee80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f0bfd7b0000 CR3: 000000027762e003 CR4: 00000000003706b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __warn+0x7c/0x110
> ? psi_task_switch+0x13b/0x230
> ? report_bug+0x14c/0x170
> ? handle_bug+0x53/0x90
> ? exc_invalid_op+0x13/0x60
> ? asm_exc_invalid_op+0x16/0x20
> ? psi_task_switch+0x13b/0x230
> ? psi_task_switch+0x126/0x230
> __schedule+0x954/0x1150
> do_task_dead+0x3e/0x40
> do_exit+0x6f0/0x970
> do_group_exit+0x2c/0x80
> __x64_sys_exit_group+0x14/0x20
> x64_sys_call+0x14aa/0x16a0
> do_syscall_64+0x64/0x100
> ? __memcg_kmem_charge_page+0xdc/0x1d0
> ? do_fault+0x270/0x490
> ? __handle_mm_fault+0x321/0x1280
> ? __count_memcg_events+0x53/0xf0
> ? handle_mm_fault+0xa0/0x260
> ? do_user_addr_fault+0x33b/0x5b0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> ? clear_bhb_loop+0x45/0xa0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f0c00c5365a
> Code: Unable to access opcode bytes at 0x7f0c00c53630.
> RSP: 002b:00007ffc678fba38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c00c5365a
> RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 00007ffc678fba88 R08: 00007ffc678fb938 R09: 0000000000000018
> R10: 0000000000000008 R11: 0000000000000246 R12: 00007f0c004db8a0
> R13: 00007ffc678fbab0 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> ---[ end trace 0000000000000000 ]---
Powered by blists - more mailing lists