linux-kernel - Re: [RFC PATCH] block: xfs: dm thin: train XFS to give up on retrying IO if thinp is out of space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150723143352.GA23921@redhat.com>
Date:	Thu, 23 Jul 2015 10:33:52 -0400
From:	Mike Snitzer <snitzer@...hat.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Eric Sandeen <sandeen@...hat.com>, axboe@...nel.dk,
	linux-kernel@...r.kernel.org, xfs@....sgi.com, dm-devel@...hat.com,
	linux-fsdevel@...r.kernel.org, hch@....de,
	Vivek Goyal <vgoyal@...hat.com>
Subject: Re: [RFC PATCH] block: xfs: dm thin: train XFS to give up on
 retrying IO if thinp is out of space

On Thu, Jul 23 2015 at  1:10am -0400,
Dave Chinner <david@...morbit.com> wrote:

> On Wed, Jul 22, 2015 at 11:28:06AM -0500, Eric Sandeen wrote:
> > On 7/22/15 8:34 AM, Mike Snitzer wrote:
> > > On Tue, Jul 21 2015 at 10:37pm -0400,
> > > Dave Chinner <david@...morbit.com> wrote:
> > >> On Tue, Jul 21, 2015 at 09:40:29PM -0400, Mike Snitzer wrote:
> > >>> I'm open to considering alternative interfaces for getting you the info
> > >>> you need.  I just don't have a great sense for what mechanism you'd like
> > >>> to use.  Do we invent a new block device operations table method that
> > >>> sets values in a 'struct no_space_strategy' passed in to the
> > >>> blockdevice?
> > >>
> > >> It's long been frowned on having the filesystems dig into block
> > >> device structures. We have lots of wrapper functions for getting
> > >> information from or performing operations on block devices. (e.g.
> > >> bdev_read_only(), bdev_get_queue(), blkdev_issue_flush(),
> > >> blkdev_issue_zeroout(), etc) and so I think this is the pattern we'd
> > >> need to follow. If we do that - bdev_get_nospace_strategy() - then
> > >> how that information gets to the filesystem is completely opaque
> > >> at the fs level, and the block layer can implement it in whatever
> > >> way is considered sane...
> > >>
> > >> And, realistically, all we really need returned is a enum to tell us
> > >> how the bdev behaves on enospc:
> > >> 	- bdev fails fast, (i.e. immediate ENOSPC)
> > >> 	- bdev fails slow, (i.e. queue for some time, then ENOSPC)
> > >> 	- bdev never fails (i.e. queue forever)
> > >> 	- bdev doesn't support this (i.e. EOPNOTSUPP)
> > 
> > I'm not sure how this is more useful than the bdev simply responding to
> > a query of "should we keep trying IOs?"
> 
> 	- bdev fails fast, (i.e. immediate ENOSPC)
> 
> XFS should use a bound retry behaviour for to allow the possiblity of
> the admin adding more space before we shut down the fs. i.e.
> XFS fails slow.
> 
> 	- bdev fails slow, (i.e. queue for some time, then ENOSPC)
> 
> We know that IOs are going to be delayed before they are failed, so
> there's no point in retrying as the admin has already had a chance
> to resolve the ENOSPC condition before failure was reported. i.e.
> XFS fails fast.
> 
> 	- bdev never fails (i.e. queue forever)
> 
> Block device will appear to hang when it runs out of space. Nothing
> XFS can do here because IOs never fail, but we need to note this in
> the log at mount time so that filesystem hangs are easily explained
> when reported to us.
> 
> 	- bdev doesn't support this (i.e. EOPNOTSUPP)
> 
> XFS uses default "retry forever" behaviour.
> 
> > > This 'struct no_space_strategy' would be invented purely for
> > > informational purposes for upper layers' benefit -- I don't consider it
> > > a "block device structure" it the traditional sense.
> > > 
> > > I was thinking upper layers would like to know the actual timeout value
> > > for the "fails slow" case.  As such the 'struct no_space_strategy' would
> > > have the enum and the timeout.  And would be returned with a call:
> > >      bdev_get_nospace_strategy(bdev, &no_space_strategy)
> > 
> > Asking for the timeout value seems to add complexity.  It could change after
> > we ask, and knowing it now requires another layer to be handling timeouts...
> 
> I don't think knowing the bdev timeout is necessary because the
> default is most likely to be "fail fast" in this case. i.e. no
> retries, just shut down.  IOWs, if we describe the configs and
> actions in neutral terms, then the default configurations easy for
> users to understand. i.e:
> 
> bdev enospc		XFS default
> -----------		-----------
> Fail slow		Fail fast
> Fail fast		Fail slow
> Fail never		Fail never, Record in log
> EOPNOTSUPP		Fail never
> 
> With that in mind, I'm thinking I should drop the
> "permanent/transient" error classifications, and change it "failure
> behaviour" with the options "fast slow [never]" and only the slow
> option has retry/timeout configuration options.  I think the "never"
> option still needs to "fail at unmount" config variable, but we
> enable it by default rather than hanging unmount and requiring a
> manual shutdown like we do now....

This all sounds good to me.  The simpler XFS configuration looks like a
nice improvement.

If you just want to stub out the call to bdev_get_nospace_strategy() I
can crank through implementing it once I get a few minutes.

Btw, not sure what I was thinking when suggesting XFS would benefit from
knowing the duration of the thinp no_space_timeout.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/