linux-kernel - Re: [RFC PATCH] block: xfs: dm thin: train XFS to give up on retrying IO if thinp is out of space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150723230054.GC3902@dastard>
Date:	Fri, 24 Jul 2015 09:00:54 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Eric Sandeen <sandeen@...hat.com>,
	Mike Snitzer <snitzer@...hat.com>, axboe@...nel.dk,
	linux-kernel@...r.kernel.org, xfs@....sgi.com, dm-devel@...hat.com,
	linux-fsdevel@...r.kernel.org, hch@....de
Subject: Re: [RFC PATCH] block: xfs: dm thin: train XFS to give up on
 retrying IO if thinp is out of space

On Thu, Jul 23, 2015 at 12:43:58PM -0400, Vivek Goyal wrote:
> On Thu, Jul 23, 2015 at 03:10:43PM +1000, Dave Chinner wrote:
> 
> [..]
> > I don't think knowing the bdev timeout is necessary because the
> > default is most likely to be "fail fast" in this case. i.e. no
> > retries, just shut down.  IOWs, if we describe the configs and
> > actions in neutral terms, then the default configurations easy for
> > users to understand. i.e:
> > 
> > bdev enospc		XFS default
> > -----------		-----------
> > Fail slow		Fail fast
> > Fail fast		Fail slow
> > Fail never		Fail never, Record in log
> > EOPNOTSUPP		Fail never
> > 
> > With that in mind, I'm thinking I should drop the
> > "permanent/transient" error classifications, and change it "failure
> > behaviour" with the options "fast slow [never]" and only the slow
> > option has retry/timeout configuration options.  I think the "never"
> > option still needs to "fail at unmount" config variable, but we
> > enable it by default rather than hanging unmount and requiring a
> > manual shutdown like we do now....
> 
> I am wondering instead of 4 knobs (fast,slow,never,retry-timeout) can
> we just do with one knob per error type and that is retry-timout.

"retry-timeout" == "fail slow". i.e. a 5 minute retry timeout is
configured as:

# echo slow > fail_method
# echo 0 > max_retries
# echo 300 > retry_timeout

> retry-timeout=0 (Fail fast)
> retry-timeout=X (Fail slow)
> retry-timeout=-1 (Never Give up).

What do we do when we want to add a different failure type
with different configuration requirements?

> Also do we really need this timeout per error type.

I don't follow your logic here.  What do need a timeout for with
either the "never" or "fast" failure configurations?

> Also would be nice if this timeout was configurable using a mount
> option. Then we can just specify it during mount time and be done
> with it.

That way lies madness.  The error configuration iinfrastructure we
need is not just for ENOSPC errors on metadata buffers.  We need
configurable error behaviour for multiple different errors in
multiple different subsystems (e.g. data IO failure vs metadata
buffer IO failure vs memory allocation failure vs inode corruption
vs freespace corruption vs ....).

And we still would need the sysfs interface for querying and
configuring at runtime, so mount options are just a bad idea.  And
with sysfs, the potential future route for automatic configuration
at mount time is via udev events and configuration files, similar to
block devices.

> Idea of auto tuning based on what block device is doing sounds reasonable
> but that should not be a requirement for this patch and can go in even
> later. It is one of those nice to have features.

"this patch"? Just the core infrastructure so far:

11 files changed, 290 insertions(+), 60 deletions(-)

and that will need to be split into 4-5 patches for review. There's
a bunch of cleanup that preceeds this, and then there's a patch per
error type we are going to handle in metadata buffer IO completion.
IOWs, the dm-thinp autotuning is just a simple, small patch at the
end of a much larger series - it's maybe 10 lines of code in XFS...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/