lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK-xaQaesuU-TjDQcXgbjoNbZa0Y2qLHtSu5efy99EUDVnuhUg@mail.gmail.com>
Date: Sat, 6 Jul 2024 02:11:50 +0200
From: Andrea Gelmini <andrea.gelmini@...il.com>
To: Filipe Manana <fdmanana@...nel.org>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>, 
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>, 
	Linux regressions mailing list <regressions@...ts.linux.dev>, Btrfs BTRFS <linux-btrfs@...r.kernel.org>, 
	dsterba@...e.com, josef@...icpanda.com
Subject: Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased
 execution time of the kswapd0 process and symptoms as if there is not enough memory

Il giorno gio 4 lug 2024 alle ore 19:25 Filipe Manana
<fdmanana@...nel.org> ha scritto:
> 2) Then drop that patch that disables the shrinker.
>      With all the previous 4 patches applied, apply this one on top of them:
>
>      https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
>
>      The goal here is to see if the extent map eviction done by the
> shrinker is making reads from other tasks too slow, and check if
> that's what0s making your system unresponsive.
>
> 3) Then drop the patch from step 2), and on top of the previous 4
> patches from my git tree, apply this one:
>
>      https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
>
>      This is just to confirm if we do have concurrent calls to the
> shrinker, as the tracing seems to suggest, and where the negative
> numbers come from.
>      It also helps to check if not allowing concurrent calls to it, by
> skipping if it's already running, helps making the problems go away.

Uhm... good news...
To recap, here's this evening tests:

Kernel 6.6.36:
   Fresh BTRFS: (tar cp . | pv -ta > /dev/null): 0:03:53 [ 231MiB/s]
(time and average speed)
   Aged snapshots: (tar cp /.snapshots/|pv -at -s 100G -S >
/dev/null): 0:02:20 [ 726MiB/s]

Kernel rc6+branch+2nd patch:
   Fresh BTRFS: 0:03:14 [ 278MiB/s]
   Aged snapshots: I had to stop. PSI memory > 80%. Processes stucked
for most time. i.e.: mplayer via nfs stops every few seconds for a
while, switching virtual desktop takes >5 seconds. Also "echo 3 >
drop_caches" takes more than 5 minutes to finish (on the other two
kernels, it was quite immediate).

Kernel rc6+branch+3rd patch:
   Fresh BTRFS: 0:03:40 [ 245MiB/s]
   Aged snapshots: 0:02:03 [ 826MiB/s]
   N.b.: no skyrocket PSI memory, no swap pressure, no sluggish results!!!

Now, that was just one run, I'm going to use this patch for a few
days. Next week I can tell you for sure if everything is right!
For the moment it seems we have a winner!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ