linux-kernel - Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAL3q7H6onXi5oZT8vJaCnqKZKjMm-gq2FiDx3583nu3mfsPNAg@mail.gmail.com>
Date: Thu, 26 Sep 2024 14:45:15 +0100
From: Filipe Manana <fdmanana@...nel.org>
To: Ivan Shapovalov <intelfx@...elfx.name>
Cc: Jannik Glückert <jannik.glueckert@...il.com>, 
	andrea.gelmini@...il.com, dsterba@...e.com, josef@...icpanda.com, 
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	mikhail.v.gavrilov@...il.com, regressions@...ts.linux.dev
Subject: Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased
 execution time of the kswapd0 process and symptoms as if there is not enough memory

On Fri, Aug 16, 2024 at 12:16 PM Ivan Shapovalov <intelfx@...elfx.name> wrote:
>
> On 2024-08-16 at 11:58 +0100, Filipe Manana wrote:
> > On Fri, Aug 16, 2024 at 12:17 AM <intelfx@...elfx.name> wrote:
> > >
> > > On 2024-08-16 at 00:21 +0200, intelfx@...elfx.name wrote:
> > > > On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> > > > > <...>
> > > > > This came to my attention a couple days ago in a bugzilla report here:
> > > > >
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219121
> > > > >
> > > > > There's also 2 other recent threads in the mailing about it.
> > > > >
> > > > > There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> > > > > In case you want to try it:
> > > > >
> > > > > https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> > > > >
> > > > > Thanks.
> > > >
> > > > Hello,
> > > >
> > > > I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
> > > > kernel threads is still happening on the latest 6.10 stable with all
> > > > quoted patches applied, making the system close to unusable (not to
> > > > mention excessive power usage which crosses the line well *into*
> > > > "unusable" for low-power systems such as laptops).
> > > >
> > > > With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
> > > > cumulative CPU time of kswapd is already at 2 minutes.
> >
> > Less than 24 hours before your message, there was a patch merged to
> > Linus' tree, which was not (and is not) yet in any stable release
> > (including 6.10.5 of course).
> > Have you tried that patch?
>
> Yes, I did — as I said, I tried 6.10.5 with all combinations of patches
> ever posted in this thread (skipping those that I was not able to
> apply; it seems that there were a few mutually incompatible attempts to
> improve the extent map shrinker, some of which have already gone into
> the stable tree, thus making others inapplicable).
>
> > > As a follow-up, after 1 hour of uptime of this system the total CPU
> > > time of kswapd0 is exactly 30 minutes. So whatever is the theoretical
> > > OOM issue that the extent map shrinker is trying to solve, the solution
> >
> > It's not a theoretical problem.
> > It's a problem that any unprivileged user can trigger provided that
> > the amount of available disk space is much higher than total RAM,
> > which is by far the most common case.
> >
> > The problem is explained in the commit change log, there's a
> > reproducer and it was even reported by a user:
> >
> > https://lore.kernel.org/linux-btrfs/13f94633dcf04d29aaf1f0a43d42c55e@amazon.com/
> >
> > This link was included in the changelog of the patch when submitted to
> > the list [1], but somehow it disappeared when it was merged to the git
> > repository.
> >
> > Any user can effectively trigger a denial of service by creating an
> > unlimited number of extent maps that never get removed while it keeps
> > a file descriptor open and doing writes, either with direct IO, which
> > is simpler, or even buffered IO in case it creates holes in the files
> > (example: keep doing append writes starting after current eof, to
> > create a bunch of holes). Even if that task doing that gets killed by
> > the OOM, as long as there are idle processes keeping the file open,
> > the problem doesn't go away.
>
> Sorry, I did not intend to sound dismissive — what I wanted to say was
> that we fixed an edge case (and yes, I acknowledge that this edge case
> could be a security problem) by instead pessimizing a common case.

So I've recently sent out a patchset to update the shrinker and
re-enable it again:

https://lore.kernel.org/linux-btrfs/cover.1727174151.git.fdmanana@suse.com/

It applies against the current for-next branch, and should apply
against a 6.11 release too, except for the last patch due to a rename
in a function: CONFIG_BTRFS_DEBUG to CONFIG_BTRFS_EXPERIMENTAL.
I can prepare a git branch based on a 6.11 release (or 6.10) if anyone
prefers that rather than manually picking patches and resolving
conflicts (or testing for-next which has many unrelated changes).

If any of you can test it and report, it would be much appreciated.
Thanks.


>
> --
> Ivan Shapovalov / intelfx /
>
> > [1] https://lore.kernel.org/linux-btrfs/1cb649870b6cad4411da7998735ab1141bb9f2f0.1712837044.git.fdmanana@suse.com/
> >
> > > in its current form is clearly unacceptable.
> > >
> > > Can we please have it reverted on the basis of this severe regression,
> > > until a better solution is found?
> >
> > Disabling the shrinker might be the best for now. I'm on vacation and
> > can't write and test code, but I do have plans for making it better
> > and solving any remaining issues.
> > There's already a patch for that from Qu.