[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250401-entkernen-revitalisieren-fac4b67109e5@brauner>
Date: Tue, 1 Apr 2025 14:50:00 +0200
From: Christian Brauner <brauner@...nel.org>
To: Jan Kara <jack@...e.cz>
Cc: James Bottomley <James.Bottomley@...senpartnership.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, mcgrof@...nel.org,
hch@...radead.org, david@...morbit.com, rafael@...nel.org, djwong@...nel.org,
pavel@...nel.org, peterz@...radead.org, mingo@...hat.com, will@...nel.org,
boqun.feng@...il.com
Subject: Re: [RFC PATCH 1/4] locking/percpu-rwsem: add freezable alternative
to down_read
On Tue, Apr 01, 2025 at 01:20:37PM +0200, Jan Kara wrote:
> On Mon 31-03-25 21:13:20, James Bottomley wrote:
> > On Tue, 2025-04-01 at 01:32 +0200, Christian Brauner wrote:
> > > On Mon, Mar 31, 2025 at 03:51:43PM -0400, James Bottomley wrote:
> > > > On Thu, 2025-03-27 at 10:06 -0400, James Bottomley wrote:
> > > > [...]
> > > > > -static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem,
> > > > > bool
> > > > > reader)
> > > > > +static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem,
> > > > > bool
> > > > > reader,
> > > > > + bool freeze)
> > > > > {
> > > > > DEFINE_WAIT_FUNC(wq_entry, percpu_rwsem_wake_function);
> > > > > bool wait;
> > > > > @@ -156,7 +157,8 @@ static void percpu_rwsem_wait(struct
> > > > > percpu_rw_semaphore *sem, bool reader)
> > > > > spin_unlock_irq(&sem->waiters.lock);
> > > > >
> > > > > while (wait) {
> > > > > - set_current_state(TASK_UNINTERRUPTIBLE);
> > > > > + set_current_state(TASK_UNINTERRUPTIBLE |
> > > > > + freeze ? TASK_FREEZABLE : 0);
> > > >
> > > > This is a bit embarrassing, the bug I've been chasing is here: the
> > > > ?
> > > > operator is lower in precedence than | meaning this expression
> > > > always
> > > > evaluates to TASK_FREEZABLE and nothing else (which is why the
> > > > process
> > > > goes into R state and never wakes up).
> > > >
> > > > Let me fix that and redo all the testing.
> > >
> > > I don't think that's it. I think you're missing making pagefault
> > > writers such
> > > as systemd-journald freezable:
> > >
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index b379a46b5576..528e73f192ac 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1782,7 +1782,8 @@ static inline void __sb_end_write(struct
> > > super_block *sb, int level)
> > > static inline void __sb_start_write(struct super_block *sb, int
> > > level)
> > > {
> > > percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1,
> > > - level == SB_FREEZE_WRITE);
> > > + (level == SB_FREEZE_WRITE ||
> > > + level == SB_FREEZE_PAGEFAULT));
> > > }
> >
> > Yes, I was about to tell Jan that the condition here simply needs to be
> > true. All our rwsem levels need to be freezable to avoid a hibernation
> > failure.
>
> So there is one snag with this. SB_FREEZE_PAGEFAULT level is acquired under
> mmap_sem, SB_FREEZE_INTERNAL level is possibly acquired under some other
> filesystem locks. So if you freeze the filesystem, a task can block on
> frozen filesystem with e.g. mmap_sem held and if some other task then
Yeah, I wondered about that yesterday.
> blocks on grabbing that mmap_sem, hibernation fails because we'll be unable
> to hibernate the task waiting for mmap_sem. So if you'd like to completely
> avoid these hibernation failures, you'd have to make a slew of filesystem
> related locks use freezable sleeping. I don't think that's feasible.
>
> I was hoping that failures due to SB_FREEZE_PAGEFAULT level not being
> freezable would be rare enough but you've proven they are quite frequent.
> We can try making SB_FREEZE_PAGEFAULT level (or even SB_FREEZE_INTERNAL)
> freezable and see whether that works good enough...
I think that's fine and we'll see whether this causes a lot of issues.
I've got the patchset written in a way now that userspace can just
enable or disable freeze during migration.
Powered by blists - more mailing lists