[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251127172323.7913c99f@kf-m2g5>
Date: Thu, 27 Nov 2025 17:24:16 -0600
From: Aaron Rainbolt <arraybolt3@...il.com>
To: Mikulas Patocka <mpatocka@...hat.com>
Cc: Milan Broz <gmazyland@...il.com>, linux-mm@...ck.org,
cryptsetup@...ts.linux.dev, "dm-devel@...ts.linux.dev"
<dm-devel@...ts.linux.dev>, linux-kernel@...r.kernel.org,
adrelanos@...nix.org
Subject: Re: Hard system lock-ups when using encrypted swap and RAM is
exhausted
On Thu, 27 Nov 2025 18:54:04 +0100 (CET)
Mikulas Patocka <mpatocka@...hat.com> wrote:
> On Thu, 27 Nov 2025, Milan Broz wrote:
>
> > Hi,
> >
> > On 11/12/25 6:18 AM, Aaron Rainbolt wrote:
> > > Not sure if this is a memory management issue, a LUKS issue, or
> > > both, so I wrote both mailing lists.
> >
> > It is not a LUKS issue; cryptsetup/LUKS activates the encrypted
> > device, so it is only the kernel/dm-crypt handling IOs.
> >
> > Adding cc to dm-devel as this would be another combination
> > device-mapper and encrypted swap that could cause issues...
> >
> > However, could you please specify exactly your storage
> > configuration?
> >
> > From the subject, I expected you to have an encrypted swap, but it
> > is not clear if there are other encrypted devices.
> >
> > Please paste at least lsblk, lsblk -f output, and also luksDump
> > (or crypttab if it is not LUKS) for LUKS/dm-crypt configuration.
> >
> > Thanks,
> > Milan
> >
> >
> > >
> > > I'm seeing an issue with both the latest mainline kernel
> > > (6.18-rc5) and Debian 13's 6.12 kernel package. When physical
> > > memory fills up, the entire system locks up hard, as if it hit
> > > rather severe thrashing, despite the fact that there appears to
> > > be disk cache that can still be evicted, and there is ample
> > > amounts of swap space remaining (gigabytes of it). This issue did
> > > not occur with the 6.1 kernel in Debian 12. I'm seeing this occur
> > > in very low-memory Debian VMs, with between 512 and 900 MB RAM,
> > > running under VirtualBox and KVM. (I suspect, but have not
> > > verified, that I'm seeing similar behavior under Xen as well.)
> > > These VMs generally use a swappiness of 1, though I have seen a
> > > lockup occur even with a swappiness of 60. The filesystem in use,
> > > in case it matters, is ext4.
> > >
> > > To reproduce on a system running Linux 6.18-rc5, with :
> > >
> > > * Follow the steps from
> > > https://gitlab.com/cryptsetup/cryptsetup/-/wikis/FrequentlyAskedQuestions,
> > > section "2.3 How do I set up encrypted swap?", but creating a
> > > swapfile rather than a swap partition.
>
> Hi
>
> Encrypted swap file is not supposed to work. It uses the loop device
> that routes the requests to a filesystem and the filesystem needs to
> allocate memory to process requests.
>
> So, this is what happened to you - the machine runs out of memory, it
> needs to swap out some pages, dm-crypt encrypts the pages and
> generates write bios, the write bios are directed to the loop device,
> the loop device directs them to the filesystem, the filesystem
> attempts to allocate more memory => deadlock.
>
> I got the deadlock with 6.18-rc4 when I used dm-crypt on a file and I
> didn't get the deadlock when I used dm-crypt on a SCSI block device.
> That is expected behavior.
Is it only expected behavior since some time after kernel 6.1, or has
it always been expected behavior and encrypted swapfiles simply worked
by accident with kernel 6.1? Is there any reasonable way to reserve
some memory for in-kernel filesystem code (at least for some
filesystems like ext4 in the event it's not feasible for all of them)
that will ensure it has enough memory to handle I/O operations even if
the system is completely out of memory from userspace's perspective?
I'd be happy to try to contribute a fix if possible.
With kernel 6.1, this was working reliably, and Kicksecure (a
security-focused Debian derivative with a relatively sizable userbase)
was using encrypted swapfiles by default. We never got any reports of
lockups like we're seeing now with kernel 6.12. Whether this was
intended to work or not before, it seemed to work very well, and this
seems like a regression from our standpoint.
--
Aaron
> Note that the in-kernel OOM killer sometimes doesn't kill the
> application and discards read-only program pages (which generates big
> I/O churn and general system slowdown) - if you are hitting this
> problem, I recommend installing userspace OOM killer, such as
> earlyoom.
>
> Mikulas
>
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists