linux-kernel - Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c876143d683d356a1c657455e295525f18e08895.camel@intelfx.name>
Date: Fri, 16 Aug 2024 13:16:19 +0200
From: Ivan Shapovalov <intelfx@...elfx.name>
To: Filipe Manana <fdmanana@...nel.org>
Cc: Jannik Glückert <jannik.glueckert@...il.com>, 
	andrea.gelmini@...il.com, dsterba@...e.com, josef@...icpanda.com, 
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	mikhail.v.gavrilov@...il.com, regressions@...ts.linux.dev
Subject: Re: 6.10/regression/bisected - after f1d97e769152 I spotted
 increased execution time of the kswapd0 process and symptoms as if there is
 not enough memory

On 2024-08-16 at 11:58 +0100, Filipe Manana wrote:
> On Fri, Aug 16, 2024 at 12:17 AM <intelfx@...elfx.name> wrote:
> > 
> > On 2024-08-16 at 00:21 +0200, intelfx@...elfx.name wrote:
> > > On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> > > > <...>
> > > > This came to my attention a couple days ago in a bugzilla report here:
> > > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=219121
> > > > 
> > > > There's also 2 other recent threads in the mailing about it.
> > > > 
> > > > There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> > > > In case you want to try it:
> > > > 
> > > > https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> > > > 
> > > > Thanks.
> > > 
> > > Hello,
> > > 
> > > I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
> > > kernel threads is still happening on the latest 6.10 stable with all
> > > quoted patches applied, making the system close to unusable (not to
> > > mention excessive power usage which crosses the line well *into*
> > > "unusable" for low-power systems such as laptops).
> > > 
> > > With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
> > > cumulative CPU time of kswapd is already at 2 minutes.
> 
> Less than 24 hours before your message, there was a patch merged to
> Linus' tree, which was not (and is not) yet in any stable release
> (including 6.10.5 of course).
> Have you tried that patch?

Yes, I did — as I said, I tried 6.10.5 with all combinations of patches
ever posted in this thread (skipping those that I was not able to
apply; it seems that there were a few mutually incompatible attempts to
improve the extent map shrinker, some of which have already gone into
the stable tree, thus making others inapplicable).

> > As a follow-up, after 1 hour of uptime of this system the total CPU
> > time of kswapd0 is exactly 30 minutes. So whatever is the theoretical
> > OOM issue that the extent map shrinker is trying to solve, the solution
> 
> It's not a theoretical problem.
> It's a problem that any unprivileged user can trigger provided that
> the amount of available disk space is much higher than total RAM,
> which is by far the most common case.
> 
> The problem is explained in the commit change log, there's a
> reproducer and it was even reported by a user:
> 
> https://lore.kernel.org/linux-btrfs/13f94633dcf04d29aaf1f0a43d42c55e@amazon.com/
> 
> This link was included in the changelog of the patch when submitted to
> the list [1], but somehow it disappeared when it was merged to the git
> repository.
> 
> Any user can effectively trigger a denial of service by creating an
> unlimited number of extent maps that never get removed while it keeps
> a file descriptor open and doing writes, either with direct IO, which
> is simpler, or even buffered IO in case it creates holes in the files
> (example: keep doing append writes starting after current eof, to
> create a bunch of holes). Even if that task doing that gets killed by
> the OOM, as long as there are idle processes keeping the file open,
> the problem doesn't go away.

Sorry, I did not intend to sound dismissive — what I wanted to say was
that we fixed an edge case (and yes, I acknowledge that this edge case
could be a security problem) by instead pessimizing a common case.

-- 
Ivan Shapovalov / intelfx /

> [1] https://lore.kernel.org/linux-btrfs/1cb649870b6cad4411da7998735ab1141bb9f2f0.1712837044.git.fdmanana@suse.com/
> 
> > in its current form is clearly unacceptable.
> > 
> > Can we please have it reverted on the basis of this severe regression,
> > until a better solution is found?
> 
> Disabling the shrinker might be the best for now. I'm on vacation and
> can't write and test code, but I do have plans for making it better
> and solving any remaining issues.
> There's already a patch for that from Qu.

Download attachment "signature.asc" of type "application/pgp-signature" (863 bytes)