lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Oct 2017 18:05:23 +1100
From:   Dave Chinner <david@...morbit.com>
To:     Theodore Ts'o <tytso@....edu>,
        "Luis R. Rodriguez" <mcgrof@...nel.org>, viro@...iv.linux.org.uk,
        bart.vanassche@....com, ming.lei@...hat.com,
        darrick.wong@...cle.com, jikos@...nel.org, rjw@...ysocki.net,
        pavel@....cz, len.brown@...el.com, linux-fsdevel@...r.kernel.org,
        boris.ostrovsky@...cle.com, jgross@...e.com,
        todd.e.brandt@...ux.intel.com, nborisov@...e.com, jack@...e.cz,
        martin.petersen@...cle.com, ONeukum@...e.com,
        oleksandr@...alenko.name, oleg.b.antonyan@...il.com,
        linux-pm@...r.kernel.org, linux-block@...r.kernel.org,
        linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC 4/5] ext4: add fs freezing support on suspend/hibernation

On Tue, Oct 03, 2017 at 09:42:04PM -0400, Theodore Ts'o wrote:
> On Tue, Oct 03, 2017 at 10:13:21PM +0200, Luis R. Rodriguez wrote:
> > > After we remove add the NEEDS_RECOVERY flag, we need to make sure
> > > recovery flag is pushed out to disk before any other changes are
> > > allowed to be pushed out to disk.  That's why we originally did the
> > > update synchronously.
> > 
> > I see. I had to try to dig through linux-history to see why this was done, and
> > I actually could not find an exact commit which explained what you just did.
> > Thanks!
> > 
> > Hrm, so on freeze we issue the same commit as well, so is a sync *really* needed
> > on thaw?
> 
> So let me explain what's going on.  When we do a freeze, we make sure
> all of the blocks that are written to the journal are writen to the
> final location on disk, and the journal is is truncated.  (This is
> called a "journal checkpoint".)  Then we clear the NEEDS RECOVERY
> feature flag and set the EXT4_VALID_FS flags in the superblock.  In
> the no journal case, we flush out all metadata writes, and we set the
> EXT4_VALID_FS flag.  In both cases, we call ext4_commit_super(sb, 1)
> to make sure the flags update is pushed out to disk.  This puts the
> file system into a "quiscent" state; in fact, it looks like the file
> system has been unmounted, so that it becomes safe to run
> dump/restore, run fsck -n on the file system, etc.
> 
> The problem on the thaw side is that we need to make sure that
> EXT4_VALID_FS flag is removed (and if journaling is enabled, the NEEDS
> RECOVERY feature flag is set) and the superblock is flushed out to the
> storage device before any other writes are persisted on the disk.  In
> the case where we have journalling enabled, we could wait until the
> first journal commit to write out the superblock, since in journal
> mode all metadata updates to the disk are suppressed until the journal
> commit.  We don't do that today, but it is a change we could make.
> 
> However, in the no journal node we need to make sure the EXT4_VALID_FS
> flag is cleared on disk before any other metadata operations can take
> place, and calling ext4_commit_super(sb, 1) is the only real way to do
> that.
> 
> > No, it was am empirical evaluation done at testing, I observed bio_submit()
> > stalls, we never get completion from the device. Even if we call thaw at the
> > very end of resume, after the devices should be up, we still end up in the same
> > situation. Given how I order freezing filesystems after freezing userspace I do
> > believe we should thaw filesystems prior unfreezing userspace, its why I placed
> > the call where it is now.
> 
> So when we call ext4_commit_super(sb, 1), we issue the bio, and then
> we block waiting for the I/O to complete.  That's a blocking call.  Is
> the thaw context one which allows blocking?

thaw_super() does down_write(), so it *must* be able to sleep.

> If userspace is still
> frozen, does that imply that the scheduler isn't allow to run?  If
> that's the case, then that's probably the problem.
> 
> More generally, if the thawing code needs to allocate memory, or do

Which is does. xfs_fs_unfreeze() queues work to a workqueue, so
there's memory allocation there.

> any number of things that could potentially block, this could
> potentially be an issue.

yup, gfs2_unfreeze does even more stuff - it releases glocks which
may end up queuing work, waking other threads and freeing stuff via
call_rcu()...

Basically, before thawing filesystems the rest of the kernel
infrastructure needs to have been restarted. i.e. the order
needs to be:

freeze userspace
freeze filesystems
freeze kernel threads
freeze workqueues

thaw workqueues
thaw kernel threads
thaw filesystems
thaw userspace

and it should end up that way.

> Or if we have a network block device, or
> something else in the storage stack that needs to run a kernel thread
> context (or a workqueue, etc.) --- is the fact that userspace is
> frozen mean the scheduler is going to refuse to schedule()?

No.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ