[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKPOu+-DdwTCFDjW+ykKM5Da5wmLW3gSx5=x+fsSdaMEwUuvJw@mail.gmail.com>
Date: Mon, 5 Aug 2024 14:34:52 +0200
From: Max Kellermann <max.kellermann@...os.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org
Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter
On Wed, Jun 12, 2024 at 7:01 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> I think you can check if this theory pans out by adding a WARN_ON() ar
> the end of psi_task_switch():
>
> void psi_task_switch(struct task_struct *prev, struct task_struct
> *next, bool sleep)
> {
> ...
> if ((prev->psi_flags ^ next->psi_flags) & ~TSK_ONCPU) {
> clear &= ~TSK_ONCPU;
> for (; group; group = group->parent)
> psi_group_change(group, cpu, clear, set, now,
> wake_clock);
> }
> + WARN_ON(prev->__state & TASK_DEAD && prev->psi_flags & TSK_MEMSTALL);
> }
Our servers have been running with this experimental WARN_ON line you
suggested, and today I found one of them had produced more than 300
warnings since it was rebooted yesterday:
------------[ cut here ]------------
WARNING: CPU: 38 PID: 448145 at kernel/sched/psi.c:992
psi_task_switch+0x114/0x218
Modules linked in:
CPU: 38 PID: 448145 Comm: php-cgi8.1 Not tainted 6.9.12-cm4all1-ampere+ #178
Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023
pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : psi_task_switch+0x114/0x218
lr : psi_task_switch+0x98/0x218
sp : ffff8000c5493c80
x29: ffff8000c5493c80 x28: ffff0837ccd18640 x27: ffff07ff81ee3300
x26: ffff0837ccd18000 x25: 0000000000000000 x24: 0000000000000001
x23: 000000000000001c x22: 0000000000000026 x21: 00003010d610f448
x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000004 x13: ffff08072ca62000 x12: ffffc22f32e1a000
x11: 0000000000000001 x10: 0000000000000026 x9 : ffffc22f3129b150
x8 : ffffc22f32e1aa88 x7 : 000000000000000c x6 : 0000d7ed3b360390
x5 : ffff08faff6fb88c x4 : 0000000000000000 x3 : 0000000000e9de78
x2 : 000000008ff70300 x1 : 000000008ff8d518 x0 : 0000000000000002
Call trace:
psi_task_switch+0x114/0x218
__schedule+0x390/0xbc8
do_task_dead+0x64/0xa0
do_exit+0x5ac/0x9c0
__arm64_sys_exit+0x1c/0x28
invoke_syscall.constprop.0+0x54/0xf0
do_el0_svc+0xa4/0xc8
el0_svc+0x18/0x58
el0t_64_sync_handler+0xf8/0x128
el0t_64_sync+0x14c/0x150
---[ end trace 0000000000000000 ]---
And indeed, it has a constant (and bogus) memory pressure value:
# cat /proc/pressure/memory
some avg10=99.99 avg60=98.65 avg300=98.70 total=176280880996
full avg10=98.16 avg60=96.70 avg300=96.82 total=173950123267
It's taken nearly two months. and none of the other servers had
produced this; this seems to be a bug that's really rare.
Max
Powered by blists - more mailing lists