linux-kernel - Re: Bad psi_group_cpu.tasks[NR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJuCfpGDw7LLs2dTa+9F4J8ZaSV2YMq=-LPgOmNgrgL4P84V_Q@mail.gmail.com>
Date: Tue, 6 Aug 2024 08:56:09 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Max Kellermann <max.kellermann@...os.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Peter Zijlstra <peterz@...radead.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter

On Mon, Aug 5, 2024 at 5:35 AM Max Kellermann <max.kellermann@...os.com> wrote:
>
> On Wed, Jun 12, 2024 at 7:01 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > I think you can check if this theory pans out by adding a WARN_ON() ar
> > the end of psi_task_switch():
> >
> > void psi_task_switch(struct task_struct *prev, struct task_struct
> > *next, bool sleep)
> > {
> > ...
> >         if ((prev->psi_flags ^ next->psi_flags) & ~TSK_ONCPU) {
> >                 clear &= ~TSK_ONCPU;
> >                 for (; group; group = group->parent)
> >                         psi_group_change(group, cpu, clear, set, now,
> > wake_clock);
> >         }
> > +        WARN_ON(prev->__state & TASK_DEAD && prev->psi_flags & TSK_MEMSTALL);
> > }
>
> Our servers have been running with this experimental WARN_ON line you
> suggested, and today I found one of them had produced more than 300
> warnings since it was rebooted yesterday:
>
>  ------------[ cut here ]------------
>  WARNING: CPU: 38 PID: 448145 at kernel/sched/psi.c:992
> psi_task_switch+0x114/0x218
>  Modules linked in:
>  CPU: 38 PID: 448145 Comm: php-cgi8.1 Not tainted 6.9.12-cm4all1-ampere+ #178
>  Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023
>  pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : psi_task_switch+0x114/0x218
>  lr : psi_task_switch+0x98/0x218
>  sp : ffff8000c5493c80
>  x29: ffff8000c5493c80 x28: ffff0837ccd18640 x27: ffff07ff81ee3300
>  x26: ffff0837ccd18000 x25: 0000000000000000 x24: 0000000000000001
>  x23: 000000000000001c x22: 0000000000000026 x21: 00003010d610f448
>  x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000
>  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>  x14: 0000000000000004 x13: ffff08072ca62000 x12: ffffc22f32e1a000
>  x11: 0000000000000001 x10: 0000000000000026 x9 : ffffc22f3129b150
>  x8 : ffffc22f32e1aa88 x7 : 000000000000000c x6 : 0000d7ed3b360390
>  x5 : ffff08faff6fb88c x4 : 0000000000000000 x3 : 0000000000e9de78
>  x2 : 000000008ff70300 x1 : 000000008ff8d518 x0 : 0000000000000002
>  Call trace:
>   psi_task_switch+0x114/0x218
>   __schedule+0x390/0xbc8
>   do_task_dead+0x64/0xa0
>   do_exit+0x5ac/0x9c0
>   __arm64_sys_exit+0x1c/0x28
>   invoke_syscall.constprop.0+0x54/0xf0
>   do_el0_svc+0xa4/0xc8
>   el0_svc+0x18/0x58
>   el0t_64_sync_handler+0xf8/0x128
>   el0t_64_sync+0x14c/0x150
>  ---[ end trace 0000000000000000 ]---
>
> And indeed, it has a constant (and bogus) memory pressure value:
>
>  # cat /proc/pressure/memory
>  some avg10=99.99 avg60=98.65 avg300=98.70 total=176280880996
>  full avg10=98.16 avg60=96.70 avg300=96.82 total=173950123267

Hmm. The original scenario I was thinking about when I proposed this
WARN_ON() was deemed impossible, so I think the only other possibility
is that the task being killed somehow skipped psi_memstall_leave()
before its death... Did you have the instrumentation I suggested to
track imbalance between psi_memstall_enter()/psi_memstall_leave() and
to record the _RET_IP_? If so, did it trigger at all?

>
> It's taken nearly two months. and none of the other servers had
> produced this; this seems to be a bug that's really rare.
>
> Max