lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250331-inkrafttreten-lieder-5396ffd0af7a@brauner>
Date: Mon, 31 Mar 2025 11:13:20 +0200
From: Christian Brauner <brauner@...nel.org>
To: James Bottomley <James.Bottomley@...senpartnership.com>
Cc: jack@...e.cz, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, mcgrof@...nel.org, hch@...radead.org, david@...morbit.com, 
	rafael@...nel.org, djwong@...nel.org, pavel@...nel.org, peterz@...radead.org, 
	mingo@...hat.com, will@...nel.org, boqun.feng@...il.com
Subject: Re: [PATCH v2 0/6] Extend freeze support to suspend and hibernate

On Sun, Mar 30, 2025 at 10:00:56AM -0400, James Bottomley wrote:
> On Sun, 2025-03-30 at 10:33 +0200, Christian Brauner wrote:
> [...]
> > > I found the systemd bug
> > > 
> > > https://github.com/systemd/systemd/issues/36888
> > 
> > I don't think that's a systemd bug.
> 
> Heh, well I have zero interest in refereeing a turf war between systemd
> and dracut over mismatched expectations.  The point for anyone who
> wants to run hibernate tests is that until they both sort this out the
> bug can be fixed by removing the system identifier check from systemd-
> hibernate-resume-generator.
> 
> > > And hacked around it, so I can confirm a simple hibernate/resume
> > > works provided the sd_start_write() patches are applied (and the
> > > hooks are plumbed in to pm).
> > > 
> > > There is an oddity: the systemd-journald process that would usually
> > > hang hibernate in D wait goes into R but seems to be hung and can't
> > > be killed by the watchdog even with a -9.  It's stack trace says
> > > it's still stuck in sb_start_write:
> > > 
> > > [<0>] percpu_rwsem_wait.constprop.10+0xd1/0x140
> > > [<0>] ext4_page_mkwrite+0x3c1/0x560 [ext4]
> > > [<0>] do_page_mkwrite+0x38/0xa0
> > > [<0>] do_wp_page+0xd5/0xba0
> > > [<0>] __handle_mm_fault+0xa29/0xca0
> > > [<0>] handle_mm_fault+0x16a/0x2d0
> > > [<0>] do_user_addr_fault+0x3ab/0x810
> > > [<0>] exc_page_fault+0x68/0x150
> > > [<0>] asm_exc_page_fault+0x22/0x30
> > > 
> > > So I think there's something funny going on in thaw.
> > 
> > My uneducated guess is that it's probably an issue with ext4 freezing
> > and unfreezing. xfs stops workqueues after all writes and pagefault
> > writers have stopped. This is done in ->sync_fs() when it's called
> > from freeze_super(). They are restarted when ->unfreeze_fs is called.
> 
> It is possible, but I note that if I do
> 
> fsfreeze --freeze /

Freezing the root filesystem from userspace will inevitably lead to an
odd form of deadlock eventually. Either the first accidental request for
opening something as writable or even the call to fsfreeze --unfreeze /
may deadlock.

The most likely explanation for this stacktrace is that the root
filesystem isn't unfrozen. In userspace it's easy enough to trigger by
leaving the filesystem frozen without also freezing userspace processes
accessing that filesystem:

[  243.232205] INFO: task systemd-journal:539 blocked for more than 120 seconds.
[  243.239491]       Not tainted 6.14.0-g9ad3884269ca #131
[  243.243771] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.248517] task:systemd-journal state:D stack:0     pid:539   tgid:539   ppid:1      task_flags:0x400100 flags:0x00000006
[  243.253480] Call Trace:
[  243.254641]  <TASK>
[  243.255663]  __schedule+0x61e/0x1080
[  243.257071]  ? percpu_rwsem_wait+0x149/0x1b0
[  243.258473]  schedule+0x3a/0x120
[  243.259533]  percpu_rwsem_wait+0x155/0x1b0
[  243.260844]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[  243.262620]  __percpu_down_read+0x83/0x1c0
[  243.263968]  btrfs_page_mkwrite+0x45b/0x890 [btrfs]
[  243.266828]  ? find_held_lock+0x2b/0x80
[  243.267765]  do_page_mkwrite+0x4a/0xb0
[  243.268698]  do_wp_page+0x331/0xdc0
[  243.269559]  __handle_mm_fault+0xb15/0x11d0
[  243.270566]  handle_mm_fault+0xb8/0x2b0
[  243.271557]  do_user_addr_fault+0x20a/0x700
[  243.272574]  exc_page_fault+0x6a/0x200
[  243.273462]  asm_exc_page_fault+0x26/0x30

This happens because systemd-journald mmaps the journal file. It
triggers a pagefault which wants to get pagefault based write access to
the file. But it can't because pagefaults are frozen. So it hangs and as
it's not frozen it will trigger hung task warnings.

IOW, the most likely explanation is that the root filesystem wasn't
unfrozen and systemd-journald wasn't frozen.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ