[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060823182632.GD5893@suse.de>
Date: Wed, 23 Aug 2006 20:26:32 +0200
From: Jens Axboe <axboe@...e.de>
To: Ric Wheeler <ric@....com>
Cc: Andrew Morton <akpm@...l.org>,
Akinobu Mita <mita@...aclelinux.com>,
linux-kernel@...r.kernel.org, okuji@...ug.org
Subject: Re: [patch 4/5] fail-injection capability for disk IO
On Wed, Aug 23 2006, Ric Wheeler wrote:
> Jens Axboe wrote:
> >On Wed, Aug 23 2006, Andrew Morton wrote:
> >
> >>On Wed, 23 Aug 2006 14:03:55 +0200
> >>Jens Axboe <axboe@...e.de> wrote:
> >>
> >>
> >>>On Wed, Aug 23 2006, Akinobu Mita wrote:
> >>>
> >>>>This patch provides fail-injection capability for disk IO.
> >>>>
> >>>>Boot option:
> >>>>
> >>>> fail_make_request=<probability>,<interval>,<times>,<space>
> >>>>
> >>>> <probability>
> >>>>
> >>>> specifies how often it should fail in percent.
> >>>>
> >>>> <interval>
> >>>>
> >>>> specifies the interval of failures.
> >>>>
> >>>> <times>
> >>>>
> >>>> specifies how many times failures may happen at most.
> >>>>
> >>>> <space>
> >>>>
> >>>> specifies the size of free space where disk IO can be issued
> >>>> safely in bytes.
> >>>>
> >>>>Example:
> >>>>
> >>>> fail_make_request=100,10,-1,0
> >>>>
> >>>>generic_make_request() fails once per 10 times.
> >>>
> >>>Hmm dunno, seems a pretty useless feature to me.
> >>
> >>We need it. What is the FS/VFS/VM behaviour in the presence of IO
> >>errors? Nobody knows, because we rarely test it. Those few times where
> >>people _do_ test it (the hard way), bad things tend to happen. reiserfs
> >>(for example) likes to go wobble, wobble, wobble, BUG.
> >
> >
> >You misunderstood me - a global parameter is useless, as it makes it
> >pretty impossible for people to use this for any sort of testing (unless
> >it's very specialized). I didn't say a feature to test io errors was
> >useless!
> >
> >
> >>>Wouldn't it make a lot
> >>>more sense to do this per-queue instead of a global entity?
> >>
> >>Yes, I think so. /sys/block/sda/sda2/make-it-fail.
> >
> >
> >Precisely.
> >
>
> I think that this is very useful for testing file systems.
>
> What this will miss is the error path through the lower levels of the IO
> path (i.e., the libata/SCSI error handling confusion that Mark Lord has
> been working on patches for would need some error injection at or below
> the libata level).
>
> We currently test this whole path with either weird fault injection gear
> to hit the s-ata bus or the old fashion pile of moderately flaky disks
> that we try hard not to fix or totally kill.
>
> It would be really useful to get something (target mode SW disk? libata
> or other low level error injection?) to test this whole path in software...
Yes, this approach only tests the layer(s) above the device. To simulate
hardware failure or timeouts, I _think_ scsi_debug can already help you
quite a bit. If not, it should be easy enough to extend do add these
sorts of things.
--
Jens Axboe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists