[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100802104227.79340b49@notabene>
Date: Mon, 2 Aug 2010 10:42:27 +1000
From: Neil Brown <neilb@...e.de>
To: Tejun Heo <tj@...nel.org>
Cc: Vladislav Bolkhovitin <vst@...b.net>,
Bryan Mesich <bryan.mesich@...u.edu>,
scst-devel@...ts.sourceforge.net,
Jens Axboe <jens.axboe@...cle.com>,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
dm-devel@...hat.com
Subject: Re: RAID/block regression starting from 2.6.32, bisected
On Fri, 30 Jul 2010 12:29:30 +0200
Tejun Heo <tj@...nel.org> wrote:
> Hello,
>
> On 07/28/2010 08:16 PM, Vladislav Bolkhovitin wrote:
> > In recent kernels we are experiencing a problem that in our setup
> > using SCST BLOCKIO backend some BIOs are finished, i.e. the finish
> > callback called for them, with error -EIO. It happens quite often,
> > much more often than one would expect to have an actual IO
> > error. (BLOCKIO backend just converts all incoming SCSI commands to
> > the corresponding block requests.)
> >
> > After some investigation, we figured out, that, most likely,
> > raid5.c::make_request() for some reason sometimes calls bio_endio()
> > with not BIO_UPTODATE bios.
> >
> > We bisected it to commit:
> >
> > commit a82afdfcb8c0df09776b6458af6b68fc58b2e87b
> > Author: Tejun Heo <tj@...nel.org>
> > Date: Fri Jul 3 17:48:16 2009 +0900
> >
> > block: use the same failfast bits for bio and request
>
> That commit doesn't (or at least isn't supposed to) make any behavior
> difference. It's just repositioning flag bits. If the commit is
> actually causing the problem, I think one possibility is that whatever
> code could be using hard coded constants which now are mapped to
> different flags. The mixed merge changes have been in mainline for
> quite some time and shipping in all major distros too and this is the
> first time this is reported, so I don't think it could be a widespread
> problem.
>
> Thanks.
>
The problem is that md/raid5 tests bio->bi_rw against RWA_MASK, which used to
align with BIO_RW_AHEAD, and now doesn't.
However the definition of bio_rw() in fs.h seems to justify that RWA_MASK
should align with BIO_RW_AHEAD, as does the definition of READA.
Given the current definitions, any WRITE request with BIO_RW_FAILFAST_DEV
set is going to confused a number of drives which test
bio_rw(bio) == WRITE
I guess RWA_MASK needs to be changed to (1<<BIO_RW_AHEAD), and READA need to
be change to that value too.
Can I leave that to you Tejun?
Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists