[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f140c076c3756e84d515b81ee9eeeaf13ca4b42.camel@HansenPartnership.com>
Date: Sun, 30 Mar 2025 10:00:56 -0400
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Christian Brauner <brauner@...nel.org>, jack@...e.cz
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
mcgrof@...nel.org, hch@...radead.org, david@...morbit.com,
rafael@...nel.org, djwong@...nel.org, pavel@...nel.org,
peterz@...radead.org, mingo@...hat.com, will@...nel.org,
boqun.feng@...il.com
Subject: Re: [PATCH v2 0/6] Extend freeze support to suspend and hibernate
On Sun, 2025-03-30 at 10:33 +0200, Christian Brauner wrote:
[...]
> > I found the systemd bug
> >
> > https://github.com/systemd/systemd/issues/36888
>
> I don't think that's a systemd bug.
Heh, well I have zero interest in refereeing a turf war between systemd
and dracut over mismatched expectations. The point for anyone who
wants to run hibernate tests is that until they both sort this out the
bug can be fixed by removing the system identifier check from systemd-
hibernate-resume-generator.
> > And hacked around it, so I can confirm a simple hibernate/resume
> > works provided the sd_start_write() patches are applied (and the
> > hooks are plumbed in to pm).
> >
> > There is an oddity: the systemd-journald process that would usually
> > hang hibernate in D wait goes into R but seems to be hung and can't
> > be killed by the watchdog even with a -9. It's stack trace says
> > it's still stuck in sb_start_write:
> >
> > [<0>] percpu_rwsem_wait.constprop.10+0xd1/0x140
> > [<0>] ext4_page_mkwrite+0x3c1/0x560 [ext4]
> > [<0>] do_page_mkwrite+0x38/0xa0
> > [<0>] do_wp_page+0xd5/0xba0
> > [<0>] __handle_mm_fault+0xa29/0xca0
> > [<0>] handle_mm_fault+0x16a/0x2d0
> > [<0>] do_user_addr_fault+0x3ab/0x810
> > [<0>] exc_page_fault+0x68/0x150
> > [<0>] asm_exc_page_fault+0x22/0x30
> >
> > So I think there's something funny going on in thaw.
>
> My uneducated guess is that it's probably an issue with ext4 freezing
> and unfreezing. xfs stops workqueues after all writes and pagefault
> writers have stopped. This is done in ->sync_fs() when it's called
> from freeze_super(). They are restarted when ->unfreeze_fs is called.
It is possible, but I note that if I do
fsfreeze --freeze /
I can produce exactly the above stack trace in systemd-journald, but if
I unfreeze root it continues on normally. Thus I think this is some
type of bad interaction with the process freezing that goes on in
hibernate. I'm going to see if I can replicate using the cgroup
freezer.
> But for ext4 in ->sync_fs() the rsv_conversion_wq is flushed. I think
> that should be safe to do but I'm not sure if there can't be other
> work coming in on it before the actual freeze call. Jan will be able
> to explain this a lot better. I don't have time today to figure out
> what this does.
Understood. The above is for Jan if he'd like to think about it.
Regards,
James
Powered by blists - more mailing lists