linux-kernel - Re: Bad psi_group_cpu.tasks[NR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b8681f80-f2c5-44a0-b306-9f566dad65a6@linux.alibaba.com>
Date: Thu, 21 Nov 2024 21:18:00 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Max Kellermann <max.kellermann@...os.com>, Christoph Hellwig <hch@....de>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
 Johannes Weiner <hannes@...xchg.org>, Peter Zijlstra <peterz@...radead.org>,
 linux-kernel@...r.kernel.org
Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter

Hi Max!

On 2024/11/21 16:43, Max Kellermann wrote:
> On Thu, Nov 21, 2024 at 5:51 AM Christoph Hellwig <hch@....de> wrote:
>> Something seems to be going wrong here, though, but the trace below
>> doesn't really tell me anything about the workload or file system
>> used, and if this is even calling into readahead.
> 
> In case you were asking :-) these are web servers (shared webhosting),
> running PHP most of the time. The host itself runs on an ext4, but I
> don't think the ext4 system partition has anything to do with this.
> PHP runs in containers that are erofs, the PHP sources plus
> memory-mapped opcache files are in btrfs (read-only snapshot) and the
> runtime data is on NFS or Ceph (there have been stalls on both server
> types).
> My limited experience with Linux MM suggests that this happens during
> the page fault of a memory mapped file. PHP processes usually mmap
> only files from erofs and btrfs.
> The servers are always somewhat under memory pressure; our container
> manager keeps as many containers alive as possible and only shuts them
> down when the server reaches the memory limit. At any given time,
> there are thousands of containers.

Just saw this. I guess your _recent_ 6.11.9 bug is actually
related to EROFS since EROFS uses readahead_expand().  I think
your recent report was introduced by a recent backport fix
commit 9e2f9d34dd12 ("erofs: handle overlapped pclusters out of crafted images properly")
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.11.9&id=9cfa199bcbbbba31cbf97b2786f44f4464f3f29a

bio can be NULL after this patch and causes
unbalanced psi_memstall_{enter,leave}().  It can be fixed as
(the diff below could be damaged due to my email client):

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 01f147505487..19ef4ff2a134 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1792,9 +1792,9 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
                         erofs_fscache_submit_bio(bio);
                 else
                         submit_bio(bio);
-               if (memstall)
-                       psi_memstall_leave(&pflags);
         }
+       if (memstall)
+               psi_memstall_leave(&pflags);

         /*
          * although background is preferred, no one is pending for submission.

But your original report is without the very recent
commit 9e2f9d34dd12, before this commit bio cannot
be NULL so I don't think they are the same issue.

I will submit a formal fix for the recent bug later,
thanks!

Thanks,
Gao Xiang