[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C74CD95.1000208@ct.jp.nec.com>
Date: Wed, 25 Aug 2010 17:00:21 +0900
From: Kiyoshi Ueda <k-ueda@...jp.nec.com>
To: Tejun Heo <tj@...nel.org>
CC: Mike Snitzer <snitzer@...hat.com>, Hannes Reinecke <hare@...e.de>,
tytso@....edu, linux-scsi@...r.kernel.org, jaxboe@...ionio.com,
jack@...e.cz, linux-kernel@...r.kernel.org, swhiteho@...hat.com,
linux-raid@...r.kernel.org, linux-ide@...r.kernel.org,
James.Bottomley@...e.de, konishi.ryusuke@....ntt.co.jp,
linux-fsdevel@...r.kernel.org, vst@...b.net, rwheeler@...hat.com,
Christoph Hellwig <hch@....de>, chris.mason@...cle.com,
dm-devel@...hat.com
Subject: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with
sequenced flush
Hi Tejun,
On 08/25/2010 01:59 AM +0900, Tejun Heo wrote:
> On 08/24/2010 12:24 PM, Kiyoshi Ueda wrote:
>> Anyway, only reporting errors for REQ_FLUSH to upper layer without
>> such a solution would make dm-multipath almost unusable in real world,
>> although it's better than implicit data loss.
>
> I see.
>
>>> Maybe just turn off barrier support in mpath for now?
>>
>> If it's possible, it could be a workaround for a short term.
>> But how can you do that?
>>
>> I think it's not enough to just drop REQ_FLUSH flag from q->flush_flags.
>> Underlying devices of a mpath device may have write-back cache and
>> it may be enabled.
>> So if a mpath device doesn't set REQ_FLUSH flag in q->flush_flags, it
>> becomes a device which has write-back cache but doesn't support flush.
>> Then, upper layer can do nothing to ensure cache flush?
>
> Yeah, I was basically suggesting to forget about cache flush w/ mpath
> until it can be fixed. You're saying that if mpath just passes
> REQ_FLUSH upwards without retrying, it will be almost unuseable,
> right?
Right.
If the error is safe/needed to retry using other paths, mpath should
retry even if REQ_FLUSH. Otherwise, only one path failure may result
in system down.
Just passing any REQ_FLUSH error upwards regardless the error type
will make such situations, and users will feel the behavior as
unstable/unusable.
> I'm not sure how to proceed here. How much work would
> discerning between transport and IO errors take? If it can't be done
> quickly enough the retry logic can be kept around to keep the old
> behavior but that already was a broken behavior, so... :-(
I'm not sure how long will it take.
Anyway, as you said, the flush error handling of dm-mpath is already
broken if data loss really happens on any storage used by dm-mpath.
Although it's a serious issue and quick fix is required, I think
you may leave the old behavior in your patch-set, since it's
a separate issue.
Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists