linux-kernel - Re: [RFC PATCH 1/4] locking/percpu-rwsem: add freezable alternative to down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1d913e99368039b77945d1be89e6626b4238f665.camel@HansenPartnership.com>
Date: Tue, 01 Apr 2025 08:52:02 -0400
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Jan Kara <jack@...e.cz>
Cc: Christian Brauner <brauner@...nel.org>, linux-fsdevel@...r.kernel.org, 
 linux-kernel@...r.kernel.org, mcgrof@...nel.org, hch@...radead.org, 
 david@...morbit.com, rafael@...nel.org, djwong@...nel.org,
 pavel@...nel.org,  peterz@...radead.org, mingo@...hat.com, will@...nel.org,
 boqun.feng@...il.com
Subject: Re: [RFC PATCH 1/4] locking/percpu-rwsem: add freezable alternative
 to down_read

On Tue, 2025-04-01 at 13:20 +0200, Jan Kara wrote:
> On Mon 31-03-25 21:13:20, James Bottomley wrote:
> > On Tue, 2025-04-01 at 01:32 +0200, Christian Brauner wrote:
[...]
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index b379a46b5576..528e73f192ac 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1782,7 +1782,8 @@ static inline void __sb_end_write(struct
> > > super_block *sb, int level)
> > >  static inline void __sb_start_write(struct super_block *sb, int
> > > level)
> > >  {
> > >         percpu_down_read_freezable(sb->s_writers.rw_sem + level -
> > > 1,
> > > -                                  level == SB_FREEZE_WRITE);
> > > +                                  (level == SB_FREEZE_WRITE ||
> > > +                                   level ==
> > > SB_FREEZE_PAGEFAULT));
> > >  }
> > 
> > Yes, I was about to tell Jan that the condition here simply needs
> > to be true.  All our rwsem levels need to be freezable to avoid a
> > hibernation failure.
> 
> So there is one snag with this. SB_FREEZE_PAGEFAULT level is acquired
> under mmap_sem, SB_FREEZE_INTERNAL level is possibly acquired under
> some other filesystem locks.

Just for SB_FREEZE_INTERNAL, I think there's no case of
sb_start_intwrite() that can ever hold in D wait because by the time we
acquire the semaphore for write, the internal freeze_fs should have
been called and the filesystem should have quiesced itself.  On the
other hand, if that theory itself is true, there's no real need for
sb_start_intwrite() at all because it can never conflict.

>  So if you freeze the filesystem, a task can block on frozen
> filesystem with e.g. mmap_sem held and if some other task then blocks
> on grabbing that mmap_sem, hibernation fails because we'll be unable
> to hibernate the task waiting for mmap_sem. So if you'd like to
> completely avoid these hibernation failures, you'd have to make a
> slew of filesystem related locks use freezable sleeping. I don't
> think that's feasible.

I wouldn't see that because I'm on x86_64 and that takes the vma_lock
in page faults not the mmap_lock.  The granularity of all these locks
is process level, so it's hard to see what they'd be racing with ...
even if I conjecture two threads trying to write to something, they'd
have to have some internal co-ordination which would likely prevent the
second one from writing if the first got stuck on the page fault. 

> I was hoping that failures due to SB_FREEZE_PAGEFAULT level not being
> freezable would be rare enough but you've proven they are quite
> frequent. We can try making SB_FREEZE_PAGEFAULT level (or even
> SB_FREEZE_INTERNAL) freezable and see whether that works good
> enough...

I'll try to construct a more severe test than systemd-journald ... it
looks to be single threaded in its operation.

Regards,

James