lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250330-akupunktur-weshalb-c90594b6ad01@brauner>
Date: Sun, 30 Mar 2025 13:53:39 +0200
From: Christian Brauner <brauner@...nel.org>
To: James Bottomley <James.Bottomley@...senpartnership.com>, jack@...e.cz
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	mcgrof@...nel.org, hch@...radead.org, david@...morbit.com, rafael@...nel.org, 
	djwong@...nel.org, pavel@...nel.org, peterz@...radead.org, mingo@...hat.com, 
	will@...nel.org, boqun.feng@...il.com
Subject: Re: [PATCH v2 0/6] Extend freeze support to suspend and hibernate

On Sun, Mar 30, 2025 at 10:33:53AM +0200, Christian Brauner wrote:
> On Sat, Mar 29, 2025 at 01:02:32PM -0400, James Bottomley wrote:
> > On Sat, 2025-03-29 at 10:04 -0400, James Bottomley wrote:
> > > On Sat, 2025-03-29 at 09:42 +0100, Christian Brauner wrote:
> > > > Add the necessary infrastructure changes to support freezing for
> > > > suspend and hibernate.
> > > > 
> > > > Just got back from LSFMM. So still jetlagged and likelihood of bugs
> > > > increased. This should all that's needed to wire up power.
> > > > 
> > > > This will be in vfs-6.16.super shortly.
> > > > 
> > > > ---
> > > > Changes in v2:
> > > > - Don't grab reference in the iterator make that a requirement for
> > > > the callers that need custom behavior.
> > > > - Link to v1:
> > > > https://lore.kernel.org/r/20250328-work-freeze-v1-0-a2c3a6b0e7a6@kernel.org
> > > 
> > > Given I've been a bit quiet on this, I thought I'd better explain
> > > what's going on: I do have these built, but I made the mistake of
> > > doing a dist-upgrade on my testing VM master image and it pulled in a
> > > version of systemd (257.4-3) that has a broken hibernate.  Since I
> > > upgraded in place I don't have the old image so I'm spending my time
> > > currently debugging systemd ... normal service will hopefully resume
> > > shortly.
> > 
> > I found the systemd bug
> > 
> > https://github.com/systemd/systemd/issues/36888
> 
> I don't think that's a systemd bug.
> 
> > And hacked around it, so I can confirm a simple hibernate/resume works
> > provided the sd_start_write() patches are applied (and the hooks are
> > plumbed in to pm).
> > 
> > There is an oddity: the systemd-journald process that would usually
> > hang hibernate in D wait goes into R but seems to be hung and can't be
> > killed by the watchdog even with a -9.  It's stack trace says it's
> > still stuck in sb_start_write:
> > 
> > [<0>] percpu_rwsem_wait.constprop.10+0xd1/0x140
> > [<0>] ext4_page_mkwrite+0x3c1/0x560 [ext4]
> > [<0>] do_page_mkwrite+0x38/0xa0
> > [<0>] do_wp_page+0xd5/0xba0
> > [<0>] __handle_mm_fault+0xa29/0xca0
> > [<0>] handle_mm_fault+0x16a/0x2d0
> > [<0>] do_user_addr_fault+0x3ab/0x810
> > [<0>] exc_page_fault+0x68/0x150
> > [<0>] asm_exc_page_fault+0x22/0x30
> > 
> > So I think there's something funny going on in thaw.
> 
> My uneducated guess is that it's probably an issue with ext4 freezing
> and unfreezing. xfs stops workqueues after all writes and pagefault
> writers have stopped. This is done in ->sync_fs() when it's called from
> freeze_super(). They are restarted when ->unfreeze_fs is called.
> 
> But for ext4 in ->sync_fs() the rsv_conversion_wq is flushed. I think
> that should be safe to do but I'm not sure if there can't be other work
> coming in on it before the actual freeze call. Jan will be able to
> explain this a lot better. I don't have time today to figure out what
> this does.

Though I'm just looking at the patch snippet you posted for how you
hooked up efivarfs in https://lore.kernel.org/r/a7e6dee45ac11519c33a297797990fce6bb32bff.camel@HansenPartnership.com
and that looks pretty broken and is probably the root cause. You have:

+static int efivarfs_thaw(struct super_block *sb, enum freeze_holder who);
 static const struct super_operations efivarfs_ops = {
        .statfs = efivarfs_statfs,
        .drop_inode = generic_delete_inode,
        .alloc_inode = efivarfs_alloc_inode,
        .free_inode = efivarfs_free_inode,
        .show_options = efivarfs_show_options,
+       .thaw_super = efivarfs_thaw,
 };

Which adds ->thaw_super() without ->freeze_super() which means that
->thaw_super() is never called for efivarfs.

But also it's broken in other ways. You're not waiting for writers to
finish. Which is most often fine because efivarfs shouldn't be written
to that heavily but still this won't work and you need to call the
generic VFS helpers.

I'm appending a draft for how to do this with efivarfs. Note, I don't
have the means/time to test this right now. Would you please plumb in
your recursive removal into my patch and test it? I'm pushing it to
vfs-6.16.super for now (It likely will fail due to unused helpers right
now because I gutted the recursive removal.).

View attachment "0001-DRAFT-efivarfs-support-freeze-thaw-for-suspend-hiber.patch" of type "text/x-diff" (7178 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ