lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEBs3R8C910eiaXcSMPPrtbMjFLNYzYdPGJG+gw4WHM8A@mail.gmail.com>
Date: Wed, 20 Nov 2024 08:56:21 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Max Kellermann <max.kellermann@...os.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Peter Zijlstra <peterz@...radead.org>, 
	linux-kernel@...r.kernel.org, Christoph Hellwig <hch@....de>
Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter

On Wed, Nov 20, 2024 at 8:30 AM Max Kellermann <max.kellermann@...os.com> wrote:
>
> On Tue, Aug 6, 2024 at 5:56 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> > Hmm. The original scenario I was thinking about when I proposed this
> > WARN_ON() was deemed impossible, so I think the only other possibility
> > is that the task being killed somehow skipped psi_memstall_leave()
> > before its death... Did you have the instrumentation I suggested to
> > track imbalance between psi_memstall_enter()/psi_memstall_leave() and
> > to record the _RET_IP_? If so, did it trigger at all?
>
> Hi Suren,
>
> it has taken a few months, but today, this bug was trapped by 6.11.9
> with your patch:
>
>  ------------[ cut here ]------------
>  Stall from readahead_expand+0xcf/0x1d0 was never cleared

This is interesting. readahead_expand() indeed has
psi_memstall_enter() without a matching psi_memstall_leave():
https://elixir.bootlin.com/linux/v6.11.9/source/mm/readahead.c#L727
and https://elixir.bootlin.com/linux/v6.11.9/source/mm/readahead.c#L754.
Looks like this was introduced in [1].  I'm not sure if that's a bug
or psi_memstall_leave() is supposed to be called from somewhere else.
CC'ing Christoph to clarify.

[1] 176042404ee6 "mm: add PSI accounting around ->read_folio and
->readahead calls"

>  WARNING: CPU: 2 PID: 22644 at kernel/exit.c:933 do_exit+0x8c7/0x970
>  Modules linked in:
>  CPU: 2 UID: 2148030876 PID: 22644 Comm: php-was Not tainted
> 6.11.9-cm4all2-vm+ #242
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
>  RIP: 0010:do_exit+0x8c7/0x970
>  Code: 75 bb 4c 89 ef 48 89 dd e8 b6 21 0d 00 48 c7 c7 80 60 60 a5 e8
> ca dd ba 00 e9 5b fa ff ff 48 c7 c7 d8 4d fc a4 e8 e9 9e ff ff <0f> 0b
> e9 1d fe ff ff e8 4d e6 24 00 e9 9b fd ff ff 48 8b 78 68 e8
>  RSP: 0018:ffffc2bdd2fafc50 EFLAGS: 00010286
>  RAX: 0000000000000000 RBX: ffff9f96762bab80 RCX: 0000000000000027
>  RDX: ffff9fb1bee9b188 RSI: 0000000000000001 RDI: ffff9fb1bee9b180
>  RBP: ffffc2bdd2faf6a8 R08: 00000000ffffbfff R09: 0000000000000001
>  R10: 0000000000000003 R11: ffff9fb1beca0000 R12: ffffc2bdd2fafc60
>  R13: ffff9faa09fed5d8 R14: 0000000000000001 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff9fb1bee80000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f0bfd7b0000 CR3: 000000027762e003 CR4: 00000000003706b0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   <TASK>
>   ? __warn+0x7c/0x110
>   ? do_exit+0x8c7/0x970
>   ? report_bug+0x14c/0x170
>   ? handle_bug+0x53/0x90
>   ? exc_invalid_op+0x13/0x60
>   ? asm_exc_invalid_op+0x16/0x20
>   ? do_exit+0x8c7/0x970
>   do_group_exit+0x2c/0x80
>   __x64_sys_exit_group+0x14/0x20
>   x64_sys_call+0x14aa/0x16a0
>   do_syscall_64+0x64/0x100
>   ? __memcg_kmem_charge_page+0xdc/0x1d0
>   ? do_fault+0x270/0x490
>   ? __handle_mm_fault+0x321/0x1280
>   ? __count_memcg_events+0x53/0xf0
>   ? handle_mm_fault+0xa0/0x260
>   ? do_user_addr_fault+0x33b/0x5b0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7f0c00c5365a
>  Code: Unable to access opcode bytes at 0x7f0c00c53630.
>  RSP: 002b:00007ffc678fba38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c00c5365a
>  RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: 00007ffc678fba88 R08: 00007ffc678fb938 R09: 0000000000000018
>  R10: 0000000000000008 R11: 0000000000000246 R12: 00007f0c004db8a0
>  R13: 00007ffc678fbab0 R14: 0000000000000000 R15: 0000000000000000
>   </TASK>
>  ---[ end trace 0000000000000000 ]---
>  ------------[ cut here ]------------
>  Stall from readahead_expand+0xcf/0x1d0 was never cleared
>  WARNING: CPU: 2 PID: 22644 at kernel/sched/psi.c:989
> psi_task_switch+0x126/0x230
>  Modules linked in:
>  CPU: 2 UID: 2148030876 PID: 22644 Comm: php-was Tainted: G        W
>        6.11.9-cm4all2-vm+ #242
>  Tainted: [W]=WARN
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
>  RIP: 0010:psi_task_switch+0x126/0x230
>  Code: f6 75 e6 41 f6 44 24 18 80 74 36 41 f6 84 24 a8 08 00 00 02 74
> 2b 49 8b b4 24 b0 08 00 00 48 c7 c7 d8 4d fc a4 e8 5a de f9 ff <0f> 0b
> 41 f6 44 24 18 80 74 0d 41 f6 84 24 a8 08 00 00 02 74 02 0f
>  RSP: 0018:ffffc2bdd2fafb48 EFLAGS: 00010086
>  RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000027
>  RDX: ffff9fb1bee9b188 RSI: 0000000000000001 RDI: ffff9fb1bee9b180
>  RBP: 000000000000001c R08: 00000000ffffbfff R09: 0000000000000001
>  R10: 0000000000000003 R11: ffff9fb1beca0000 R12: ffff9f96762bab80
>  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff9fb1bee80000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f0bfd7b0000 CR3: 000000027762e003 CR4: 00000000003706b0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   <TASK>
>   ? __warn+0x7c/0x110
>   ? psi_task_switch+0x126/0x230
>   ? report_bug+0x14c/0x170
>   ? handle_bug+0x53/0x90
>   ? exc_invalid_op+0x13/0x60
>   ? asm_exc_invalid_op+0x16/0x20
>   ? psi_task_switch+0x126/0x230
>   ? psi_task_switch+0x126/0x230
>   __schedule+0x954/0x1150
>   do_task_dead+0x3e/0x40
>   do_exit+0x6f0/0x970
>   do_group_exit+0x2c/0x80
>   __x64_sys_exit_group+0x14/0x20
>   x64_sys_call+0x14aa/0x16a0
>   do_syscall_64+0x64/0x100
>   ? __memcg_kmem_charge_page+0xdc/0x1d0
>   ? do_fault+0x270/0x490
>   ? __handle_mm_fault+0x321/0x1280
>   ? __count_memcg_events+0x53/0xf0
>   ? handle_mm_fault+0xa0/0x260
>   ? do_user_addr_fault+0x33b/0x5b0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7f0c00c5365a
>  Code: Unable to access opcode bytes at 0x7f0c00c53630.
>  RSP: 002b:00007ffc678fba38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c00c5365a
>  RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: 00007ffc678fba88 R08: 00007ffc678fb938 R09: 0000000000000018
>  R10: 0000000000000008 R11: 0000000000000246 R12: 00007f0c004db8a0
>  R13: 00007ffc678fbab0 R14: 0000000000000000 R15: 0000000000000000
>   </TASK>
>  ---[ end trace 0000000000000000 ]---
>  ------------[ cut here ]------------
>  WARNING: CPU: 2 PID: 22644 at kernel/sched/psi.c:994
> psi_task_switch+0x13b/0x230
>  Modules linked in:
>  CPU: 2 UID: 2148030876 PID: 22644 Comm: php-was Tainted: G        W
>        6.11.9-cm4all2-vm+ #242
>  Tainted: [W]=WARN
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
>  RIP: 0010:psi_task_switch+0x13b/0x230
>  Code: 2b 49 8b b4 24 b0 08 00 00 48 c7 c7 d8 4d fc a4 e8 5a de f9 ff
> 0f 0b 41 f6 44 24 18 80 74 0d 41 f6 84 24 a8 08 00 00 02 74 02 <0f> 0b
> 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 41
>  RSP: 0018:ffffc2bdd2fafb48 EFLAGS: 00010002
>  RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000027
>  RDX: ffff9fb1bee9b188 RSI: 0000000000000001 RDI: ffff9fb1bee9b180
>  RBP: 000000000000001c R08: 00000000ffffbfff R09: 0000000000000001
>  R10: 0000000000000003 R11: ffff9fb1beca0000 R12: ffff9f96762bab80
>  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff9fb1bee80000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f0bfd7b0000 CR3: 000000027762e003 CR4: 00000000003706b0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   <TASK>
>   ? __warn+0x7c/0x110
>   ? psi_task_switch+0x13b/0x230
>   ? report_bug+0x14c/0x170
>   ? handle_bug+0x53/0x90
>   ? exc_invalid_op+0x13/0x60
>   ? asm_exc_invalid_op+0x16/0x20
>   ? psi_task_switch+0x13b/0x230
>   ? psi_task_switch+0x126/0x230
>   __schedule+0x954/0x1150
>   do_task_dead+0x3e/0x40
>   do_exit+0x6f0/0x970
>   do_group_exit+0x2c/0x80
>   __x64_sys_exit_group+0x14/0x20
>   x64_sys_call+0x14aa/0x16a0
>   do_syscall_64+0x64/0x100
>   ? __memcg_kmem_charge_page+0xdc/0x1d0
>   ? do_fault+0x270/0x490
>   ? __handle_mm_fault+0x321/0x1280
>   ? __count_memcg_events+0x53/0xf0
>   ? handle_mm_fault+0xa0/0x260
>   ? do_user_addr_fault+0x33b/0x5b0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   ? clear_bhb_loop+0x45/0xa0
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7f0c00c5365a
>  Code: Unable to access opcode bytes at 0x7f0c00c53630.
>  RSP: 002b:00007ffc678fba38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0c00c5365a
>  RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: 00007ffc678fba88 R08: 00007ffc678fb938 R09: 0000000000000018
>  R10: 0000000000000008 R11: 0000000000000246 R12: 00007f0c004db8a0
>  R13: 00007ffc678fbab0 R14: 0000000000000000 R15: 0000000000000000
>   </TASK>
>  ---[ end trace 0000000000000000 ]---

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ