[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8681f80-f2c5-44a0-b306-9f566dad65a6@linux.alibaba.com>
Date: Thu, 21 Nov 2024 21:18:00 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Max Kellermann <max.kellermann@...os.com>, Christoph Hellwig <hch@....de>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
Johannes Weiner <hannes@...xchg.org>, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org
Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter
Hi Max!
On 2024/11/21 16:43, Max Kellermann wrote:
> On Thu, Nov 21, 2024 at 5:51 AM Christoph Hellwig <hch@....de> wrote:
>> Something seems to be going wrong here, though, but the trace below
>> doesn't really tell me anything about the workload or file system
>> used, and if this is even calling into readahead.
>
> In case you were asking :-) these are web servers (shared webhosting),
> running PHP most of the time. The host itself runs on an ext4, but I
> don't think the ext4 system partition has anything to do with this.
> PHP runs in containers that are erofs, the PHP sources plus
> memory-mapped opcache files are in btrfs (read-only snapshot) and the
> runtime data is on NFS or Ceph (there have been stalls on both server
> types).
> My limited experience with Linux MM suggests that this happens during
> the page fault of a memory mapped file. PHP processes usually mmap
> only files from erofs and btrfs.
> The servers are always somewhat under memory pressure; our container
> manager keeps as many containers alive as possible and only shuts them
> down when the server reaches the memory limit. At any given time,
> there are thousands of containers.
Just saw this. I guess your _recent_ 6.11.9 bug is actually
related to EROFS since EROFS uses readahead_expand(). I think
your recent report was introduced by a recent backport fix
commit 9e2f9d34dd12 ("erofs: handle overlapped pclusters out of crafted images properly")
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.11.9&id=9cfa199bcbbbba31cbf97b2786f44f4464f3f29a
bio can be NULL after this patch and causes
unbalanced psi_memstall_{enter,leave}(). It can be fixed as
(the diff below could be damaged due to my email client):
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 01f147505487..19ef4ff2a134 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1792,9 +1792,9 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
erofs_fscache_submit_bio(bio);
else
submit_bio(bio);
- if (memstall)
- psi_memstall_leave(&pflags);
}
+ if (memstall)
+ psi_memstall_leave(&pflags);
/*
* although background is preferred, no one is pending for submission.
But your original report is without the very recent
commit 9e2f9d34dd12, before this commit bio cannot
be NULL so I don't think they are the same issue.
I will submit a formal fix for the recent bug later,
thanks!
Thanks,
Gao Xiang
Powered by blists - more mailing lists