[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL3q7H54-NXwzAenDde9djjqm30KkaqGdp6ABCZC57WTYpV_5A@mail.gmail.com>
Date: Sat, 6 Jul 2024 00:09:17 +0100
From: Filipe Manana <fdmanana@...nel.org>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
Cc: Andrea Gelmini <andrea.gelmini@...il.com>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>, Btrfs BTRFS <linux-btrfs@...r.kernel.org>,
dsterba@...e.com, josef@...icpanda.com
Subject: Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased
execution time of the kswapd0 process and symptoms as if there is not enough memory
On Fri, Jul 5, 2024 at 7:36 PM Mikhail Gavrilov
<mikhail.v.gavrilov@...il.com> wrote:
>
> On Thu, Jul 4, 2024 at 10:25 PM Filipe Manana <fdmanana@...nel.org> wrote:
> >
> > So several different things to try here:
> >
> > 1) First let's check that the problem is really a consequence of the shrinker.
> > Try this patch:
> >
> > https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
> >
> > This disables the shrinker. This is just to confirm if I'm looking
> > in the right direction, if your problem is the same as Mikhail's and
> > double check his bisection.
>
> [1]
> I can't check it because the patch is unapplyable on top of 661e504db04c.
> > git apply debug-1.patch
> error: patch failed: fs/btrfs/super.c:2410
> error: fs/btrfs/super.c: patch does not apply
> > cat debug-1.patch
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index f05cce7c8b8d..06c0db641d18 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2410,8 +2410,10 @@ static const struct super_operations btrfs_super_ops = {
> .statfs = btrfs_statfs,
> .freeze_fs = btrfs_freeze,
> .unfreeze_fs = btrfs_unfreeze,
> + /*
> .nr_cached_objects = btrfs_nr_cached_objects,
> .free_cached_objects = btrfs_free_cached_objects,
> + */
> };
>
> static const struct file_operations btrfs_ctl_fops = {
>
>
>
> > 2) Then drop that patch that disables the shrinker.
> > With all the previous 4 patches applied, apply this one on top of them:
> >
> > https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
> >
> > The goal here is to see if the extent map eviction done by the
> > shrinker is making reads from other tasks too slow, and check if
> > that's what0s making your system unresponsive.
> >
>
> [2]
> 6.10.0-rc6-661e504db04c-test2
> up 1:00
> root 269 15.5 0.0 0 0 ? R 10:23 9:20 [kswapd0]
> up 2:02
> root 269 21.6 0.0 0 0 ? S 10:23 26:27 [kswapd0]
> up 3:10
> root 269 25.2 0.0 0 0 ? R 10:23 48:11 [kswapd0]
> up 4:04
> root 269 29.0 0.0 0 0 ? S 10:23 71:12 [kswapd0]
> up 5:04
> root 269 26.8 0.0 0 0 ? R 10:23 81:47 [kswapd0]
> up 6:07
> root 269 27.9 0.0 0 0 ? R 10:23 102:40 [kswapd0]
> dmesg attached below as 6.10.0-rc6-661e504db04c-test2.zip
>
> > 3) Then drop the patch from step 2), and on top of the previous 4
> > patches from my git tree, apply this one:
> >
> > https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
> >
> > This is just to confirm if we do have concurrent calls to the
> > shrinker, as the tracing seems to suggest, and where the negative
> > numbers come from.
> > It also helps to check if not allowing concurrent calls to it, by
> > skipping if it's already running, helps making the problems go away.
>
> [3]
> 6.10.0-rc6-661e504db04c-test3
> up 1:00
> root 269 18.6 0.0 0 0 ? S 17:09 11:12 [kswapd0]
> up 2:00
> root 269 23.7 0.0 0 0 ? R 17:09 28:30 [kswapd0]
> up 3:00
> root 269 27.0 0.0 0 0 ? S 17:09 48:47 [kswapd0]
> up 4:00
> root 269 28.8 0.0 0 0 ? S 17:09 69:10 [kswapd0]
> up 5:00
> root 269 32.0 0.0 0 0 ? S 17:09 96:17 [kswapd0]
> up 6:00
> root 269 29.7 0.0 0 0 ? S 17:09 107:12 [kswapd0]
> dmesg attached below as 6.10.0-rc6-661e504db04c-test3.zip
>
> As we can see, the time of kswapd0 has increased significantly. It was
> 30 min in 6 hours it became 100 min. That is, it became three times
> worse even with proposed patches (1-4).
Can you try the following two branches based on 6.10-rc6?
1) https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test1_em_shrinker_6.10
2) https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test2_em_shrinker_6.10
Even if the first one makes things good, also try the second one please.
The first just includes some changes for the next merge window (for
6.11) that might help speedup things.
The second just has a change that would be simple to add to 6.10 and
we'll probably always want it or some variation of it.
Thanks!
>
> --
> Best Regards,
> Mike Gavrilov.
Powered by blists - more mailing lists