[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150723143352.GA23921@redhat.com>
Date: Thu, 23 Jul 2015 10:33:52 -0400
From: Mike Snitzer <snitzer@...hat.com>
To: Dave Chinner <david@...morbit.com>
Cc: Eric Sandeen <sandeen@...hat.com>, axboe@...nel.dk,
linux-kernel@...r.kernel.org, xfs@....sgi.com, dm-devel@...hat.com,
linux-fsdevel@...r.kernel.org, hch@....de,
Vivek Goyal <vgoyal@...hat.com>
Subject: Re: [RFC PATCH] block: xfs: dm thin: train XFS to give up on
retrying IO if thinp is out of space
On Thu, Jul 23 2015 at 1:10am -0400,
Dave Chinner <david@...morbit.com> wrote:
> On Wed, Jul 22, 2015 at 11:28:06AM -0500, Eric Sandeen wrote:
> > On 7/22/15 8:34 AM, Mike Snitzer wrote:
> > > On Tue, Jul 21 2015 at 10:37pm -0400,
> > > Dave Chinner <david@...morbit.com> wrote:
> > >> On Tue, Jul 21, 2015 at 09:40:29PM -0400, Mike Snitzer wrote:
> > >>> I'm open to considering alternative interfaces for getting you the info
> > >>> you need. I just don't have a great sense for what mechanism you'd like
> > >>> to use. Do we invent a new block device operations table method that
> > >>> sets values in a 'struct no_space_strategy' passed in to the
> > >>> blockdevice?
> > >>
> > >> It's long been frowned on having the filesystems dig into block
> > >> device structures. We have lots of wrapper functions for getting
> > >> information from or performing operations on block devices. (e.g.
> > >> bdev_read_only(), bdev_get_queue(), blkdev_issue_flush(),
> > >> blkdev_issue_zeroout(), etc) and so I think this is the pattern we'd
> > >> need to follow. If we do that - bdev_get_nospace_strategy() - then
> > >> how that information gets to the filesystem is completely opaque
> > >> at the fs level, and the block layer can implement it in whatever
> > >> way is considered sane...
> > >>
> > >> And, realistically, all we really need returned is a enum to tell us
> > >> how the bdev behaves on enospc:
> > >> - bdev fails fast, (i.e. immediate ENOSPC)
> > >> - bdev fails slow, (i.e. queue for some time, then ENOSPC)
> > >> - bdev never fails (i.e. queue forever)
> > >> - bdev doesn't support this (i.e. EOPNOTSUPP)
> >
> > I'm not sure how this is more useful than the bdev simply responding to
> > a query of "should we keep trying IOs?"
>
> - bdev fails fast, (i.e. immediate ENOSPC)
>
> XFS should use a bound retry behaviour for to allow the possiblity of
> the admin adding more space before we shut down the fs. i.e.
> XFS fails slow.
>
> - bdev fails slow, (i.e. queue for some time, then ENOSPC)
>
> We know that IOs are going to be delayed before they are failed, so
> there's no point in retrying as the admin has already had a chance
> to resolve the ENOSPC condition before failure was reported. i.e.
> XFS fails fast.
>
> - bdev never fails (i.e. queue forever)
>
> Block device will appear to hang when it runs out of space. Nothing
> XFS can do here because IOs never fail, but we need to note this in
> the log at mount time so that filesystem hangs are easily explained
> when reported to us.
>
> - bdev doesn't support this (i.e. EOPNOTSUPP)
>
> XFS uses default "retry forever" behaviour.
>
> > > This 'struct no_space_strategy' would be invented purely for
> > > informational purposes for upper layers' benefit -- I don't consider it
> > > a "block device structure" it the traditional sense.
> > >
> > > I was thinking upper layers would like to know the actual timeout value
> > > for the "fails slow" case. As such the 'struct no_space_strategy' would
> > > have the enum and the timeout. And would be returned with a call:
> > > bdev_get_nospace_strategy(bdev, &no_space_strategy)
> >
> > Asking for the timeout value seems to add complexity. It could change after
> > we ask, and knowing it now requires another layer to be handling timeouts...
>
> I don't think knowing the bdev timeout is necessary because the
> default is most likely to be "fail fast" in this case. i.e. no
> retries, just shut down. IOWs, if we describe the configs and
> actions in neutral terms, then the default configurations easy for
> users to understand. i.e:
>
> bdev enospc XFS default
> ----------- -----------
> Fail slow Fail fast
> Fail fast Fail slow
> Fail never Fail never, Record in log
> EOPNOTSUPP Fail never
>
> With that in mind, I'm thinking I should drop the
> "permanent/transient" error classifications, and change it "failure
> behaviour" with the options "fast slow [never]" and only the slow
> option has retry/timeout configuration options. I think the "never"
> option still needs to "fail at unmount" config variable, but we
> enable it by default rather than hanging unmount and requiring a
> manual shutdown like we do now....
This all sounds good to me. The simpler XFS configuration looks like a
nice improvement.
If you just want to stub out the call to bdev_get_nospace_strategy() I
can crank through implementing it once I get a few minutes.
Btw, not sure what I was thinking when suggesting XFS would benefit from
knowing the duration of the thinp no_space_timeout.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists