lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABXGCsM4tCH1wHtH0awV8J4eXWL57daMEbbuq_a_vSCEgQDqUQ@mail.gmail.com>
Date: Fri, 5 Jul 2024 23:36:15 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: Filipe Manana <fdmanana@...nel.org>
Cc: Andrea Gelmini <andrea.gelmini@...il.com>, 
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>, 
	Linux regressions mailing list <regressions@...ts.linux.dev>, Btrfs BTRFS <linux-btrfs@...r.kernel.org>, 
	dsterba@...e.com, josef@...icpanda.com
Subject: Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased
 execution time of the kswapd0 process and symptoms as if there is not enough memory

On Thu, Jul 4, 2024 at 10:25 PM Filipe Manana <fdmanana@...nel.org> wrote:
>
> So several different things to try here:
>
> 1) First let's check that the problem is really a consequence of the shrinker.
>     Try this patch:
>
>     https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
>
>     This disables the shrinker. This is just to confirm if I'm looking
> in the right direction, if your problem is the same as Mikhail's and
> double check his bisection.

[1]
I can't check it because the patch is unapplyable on top of 661e504db04c.
> git apply debug-1.patch
error: patch failed: fs/btrfs/super.c:2410
error: fs/btrfs/super.c: patch does not apply
> cat debug-1.patch
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f05cce7c8b8d..06c0db641d18 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2410,8 +2410,10 @@ static const struct super_operations btrfs_super_ops = {
        .statfs         = btrfs_statfs,
        .freeze_fs      = btrfs_freeze,
        .unfreeze_fs    = btrfs_unfreeze,
+       /*
        .nr_cached_objects = btrfs_nr_cached_objects,
        .free_cached_objects = btrfs_free_cached_objects,
+       */
 };

 static const struct file_operations btrfs_ctl_fops = {



> 2) Then drop that patch that disables the shrinker.
>      With all the previous 4 patches applied, apply this one on top of them:
>
>      https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
>
>      The goal here is to see if the extent map eviction done by the
> shrinker is making reads from other tasks too slow, and check if
> that's what0s making your system unresponsive.
>

[2]
6.10.0-rc6-661e504db04c-test2
up  1:00
root         269 15.5  0.0      0     0 ?        R    10:23   9:20 [kswapd0]
up  2:02
root         269 21.6  0.0      0     0 ?        S    10:23  26:27 [kswapd0]
up  3:10
root         269 25.2  0.0      0     0 ?        R    10:23  48:11 [kswapd0]
up  4:04
root         269 29.0  0.0      0     0 ?        S    10:23  71:12 [kswapd0]
up  5:04
root         269 26.8  0.0      0     0 ?        R    10:23  81:47 [kswapd0]
up  6:07
root         269 27.9  0.0      0     0 ?        R    10:23 102:40 [kswapd0]
dmesg attached below as 6.10.0-rc6-661e504db04c-test2.zip

> 3) Then drop the patch from step 2), and on top of the previous 4
> patches from my git tree, apply this one:
>
>      https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
>
>      This is just to confirm if we do have concurrent calls to the
> shrinker, as the tracing seems to suggest, and where the negative
> numbers come from.
>      It also helps to check if not allowing concurrent calls to it, by
> skipping if it's already running, helps making the problems go away.

[3]
6.10.0-rc6-661e504db04c-test3
up  1:00
root         269 18.6  0.0      0     0 ?        S    17:09  11:12 [kswapd0]
up  2:00
root         269 23.7  0.0      0     0 ?        R    17:09  28:30 [kswapd0]
up  3:00
root         269 27.0  0.0      0     0 ?        S    17:09  48:47 [kswapd0]
up  4:00
root         269 28.8  0.0      0     0 ?        S    17:09  69:10 [kswapd0]
up  5:00
root         269 32.0  0.0      0     0 ?        S    17:09  96:17 [kswapd0]
up  6:00
root         269 29.7  0.0      0     0 ?        S    17:09 107:12 [kswapd0]
dmesg attached below as 6.10.0-rc6-661e504db04c-test3.zip

As we can see, the time of kswapd0 has increased significantly. It was
30 min in 6 hours it became 100 min. That is, it became three times
worse even with proposed patches (1-4).

-- 
Best Regards,
Mike Gavrilov.

Download attachment "6.10.0-rc6-661e504db04c-test2.zip" of type "application/zip" (53393 bytes)

Download attachment "6.10.0-rc6-661e504db04c-test3.zip" of type "application/zip" (54961 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ