lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 12 Jun 2024 08:49:02 +0200
From: Max Kellermann <max.kellermann@...os.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Peter Zijlstra <peterz@...radead.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter

On Wed, Jun 12, 2024 at 7:01 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> Instead I think what might be happening is that the task is terminated
> while it's in memstall.

How is it possible to terminate a task that's in memstall?
This must be between psi_memstall_enter() and psi_memstall_leave(),
but I had already checked all the callers and found nothing
suspicious; no obvious way to escape the section without
psi_memstall_leave(). In my understanding, it's impossible to
terminate a task that's currently stuck in the kernel. First, it needs
to leave the kernel and go back to userspace, doesn't it?

> I think if your theory was
> correct and psi_task_change() was called while task's cgroup is
> destroyed then task_psi_group() would have returned an invalid pointer
> and we would crash once that value is dereferenced.

I was thinking of something slightly different; something about the
cgroup being deleted or a task being terminated and the bookkeeping of
the PSI flags getting wrong, maybe some data race. I found the whole
PSI code with per-task flags, per-cpu per-cgroup counters and flags
somewhat obscure (but somebody else's code is always obscure, of
course); I thought there was a lot of potential for mistakes with the
bookkeeping, but I found nothing specific.

Anyway, thanks for looking into this - I hope we can get a grip on
this issue, as it's preventing me from using PSI values for actual
process management; the servers that go into this state will always
appear overloaded and that would lead to killing all the workload
processes forever.

Max

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ