lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL3q7H54-NXwzAenDde9djjqm30KkaqGdp6ABCZC57WTYpV_5A@mail.gmail.com>
Date: Sat, 6 Jul 2024 00:09:17 +0100
From: Filipe Manana <fdmanana@...nel.org>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
Cc: Andrea Gelmini <andrea.gelmini@...il.com>, 
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>, 
	Linux regressions mailing list <regressions@...ts.linux.dev>, Btrfs BTRFS <linux-btrfs@...r.kernel.org>, 
	dsterba@...e.com, josef@...icpanda.com
Subject: Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased
 execution time of the kswapd0 process and symptoms as if there is not enough memory

On Fri, Jul 5, 2024 at 7:36 PM Mikhail Gavrilov
<mikhail.v.gavrilov@...il.com> wrote:
>
> On Thu, Jul 4, 2024 at 10:25 PM Filipe Manana <fdmanana@...nel.org> wrote:
> >
> > So several different things to try here:
> >
> > 1) First let's check that the problem is really a consequence of the shrinker.
> >     Try this patch:
> >
> >     https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
> >
> >     This disables the shrinker. This is just to confirm if I'm looking
> > in the right direction, if your problem is the same as Mikhail's and
> > double check his bisection.
>
> [1]
> I can't check it because the patch is unapplyable on top of 661e504db04c.
> > git apply debug-1.patch
> error: patch failed: fs/btrfs/super.c:2410
> error: fs/btrfs/super.c: patch does not apply
> > cat debug-1.patch
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index f05cce7c8b8d..06c0db641d18 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2410,8 +2410,10 @@ static const struct super_operations btrfs_super_ops = {
>         .statfs         = btrfs_statfs,
>         .freeze_fs      = btrfs_freeze,
>         .unfreeze_fs    = btrfs_unfreeze,
> +       /*
>         .nr_cached_objects = btrfs_nr_cached_objects,
>         .free_cached_objects = btrfs_free_cached_objects,
> +       */
>  };
>
>  static const struct file_operations btrfs_ctl_fops = {
>
>
>
> > 2) Then drop that patch that disables the shrinker.
> >      With all the previous 4 patches applied, apply this one on top of them:
> >
> >      https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
> >
> >      The goal here is to see if the extent map eviction done by the
> > shrinker is making reads from other tasks too slow, and check if
> > that's what0s making your system unresponsive.
> >
>
> [2]
> 6.10.0-rc6-661e504db04c-test2
> up  1:00
> root         269 15.5  0.0      0     0 ?        R    10:23   9:20 [kswapd0]
> up  2:02
> root         269 21.6  0.0      0     0 ?        S    10:23  26:27 [kswapd0]
> up  3:10
> root         269 25.2  0.0      0     0 ?        R    10:23  48:11 [kswapd0]
> up  4:04
> root         269 29.0  0.0      0     0 ?        S    10:23  71:12 [kswapd0]
> up  5:04
> root         269 26.8  0.0      0     0 ?        R    10:23  81:47 [kswapd0]
> up  6:07
> root         269 27.9  0.0      0     0 ?        R    10:23 102:40 [kswapd0]
> dmesg attached below as 6.10.0-rc6-661e504db04c-test2.zip
>
> > 3) Then drop the patch from step 2), and on top of the previous 4
> > patches from my git tree, apply this one:
> >
> >      https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
> >
> >      This is just to confirm if we do have concurrent calls to the
> > shrinker, as the tracing seems to suggest, and where the negative
> > numbers come from.
> >      It also helps to check if not allowing concurrent calls to it, by
> > skipping if it's already running, helps making the problems go away.
>
> [3]
> 6.10.0-rc6-661e504db04c-test3
> up  1:00
> root         269 18.6  0.0      0     0 ?        S    17:09  11:12 [kswapd0]
> up  2:00
> root         269 23.7  0.0      0     0 ?        R    17:09  28:30 [kswapd0]
> up  3:00
> root         269 27.0  0.0      0     0 ?        S    17:09  48:47 [kswapd0]
> up  4:00
> root         269 28.8  0.0      0     0 ?        S    17:09  69:10 [kswapd0]
> up  5:00
> root         269 32.0  0.0      0     0 ?        S    17:09  96:17 [kswapd0]
> up  6:00
> root         269 29.7  0.0      0     0 ?        S    17:09 107:12 [kswapd0]
> dmesg attached below as 6.10.0-rc6-661e504db04c-test3.zip
>
> As we can see, the time of kswapd0 has increased significantly. It was
> 30 min in 6 hours it became 100 min. That is, it became three times
> worse even with proposed patches (1-4).

Can you try the following two branches based on 6.10-rc6?

1)  https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test1_em_shrinker_6.10

2)  https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test2_em_shrinker_6.10

Even if the first one makes things good, also try the second one please.

The first just includes some changes for the next merge window (for
6.11) that might help speedup things.
The second just has a change that would be simple to add to 6.10 and
we'll probably always want it or some variation of it.

Thanks!

>
> --
> Best Regards,
> Mike Gavrilov.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ