linux-kernel - Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100825152831.GA8509@redhat.com>
Date:	Wed, 25 Aug 2010 11:28:32 -0400
From:	Mike Snitzer <snitzer@...hat.com>
To:	Kiyoshi Ueda <k-ueda@...jp.nec.com>
Cc:	Tejun Heo <tj@...nel.org>, Hannes Reinecke <hare@...e.de>,
	tytso@....edu, linux-scsi@...r.kernel.org, jaxboe@...ionio.com,
	jack@...e.cz, linux-kernel@...r.kernel.org, swhiteho@...hat.com,
	linux-raid@...r.kernel.org, linux-ide@...r.kernel.org,
	James.Bottomley@...e.de, konishi.ryusuke@....ntt.co.jp,
	linux-fsdevel@...r.kernel.org, vst@...b.net, rwheeler@...hat.com,
	Christoph Hellwig <hch@....de>, chris.mason@...cle.com,
	dm-devel@...hat.com
Subject: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with
 sequenced flush

On Wed, Aug 25 2010 at  4:00am -0400,
Kiyoshi Ueda <k-ueda@...jp.nec.com> wrote:

> Hi Tejun,
> 
> On 08/25/2010 01:59 AM +0900, Tejun Heo wrote:
> > On 08/24/2010 12:24 PM, Kiyoshi Ueda wrote:
> >> Anyway, only reporting errors for REQ_FLUSH to upper layer without
> >> such a solution would make dm-multipath almost unusable in real world,
> >> although it's better than implicit data loss.
> > 
> > I see.
> > 
> >>> Maybe just turn off barrier support in mpath for now?
> >> 
> >> If it's possible, it could be a workaround for a short term.
> >> But how can you do that?
> >>
> >> I think it's not enough to just drop REQ_FLUSH flag from q->flush_flags.
> >> Underlying devices of a mpath device may have write-back cache and
> >> it may be enabled.
> >> So if a mpath device doesn't set REQ_FLUSH flag in q->flush_flags, it
> >> becomes a device which has write-back cache but doesn't support flush.
> >> Then, upper layer can do nothing to ensure cache flush?
> > 
> > Yeah, I was basically suggesting to forget about cache flush w/ mpath
> > until it can be fixed.  You're saying that if mpath just passes
> > REQ_FLUSH upwards without retrying, it will be almost unuseable,
> > right?
> 
> Right.
> If the error is safe/needed to retry using other paths, mpath should
> retry even if REQ_FLUSH.  Otherwise, only one path failure may result
> in system down.
> Just passing any REQ_FLUSH error upwards regardless the error type
> will make such situations, and users will feel the behavior as
> unstable/unusable.

Right, there are hardware configurations that lend themselves to FLUSH
retries mattering, namely:
1) a SAS drive with 2 ports and a writeback cache
2) theoretically possible: SCSI array that is mpath capable but
   advertises cache as writeback (WCE=1)

The SAS case is obviously a more concrete example of why FLUSH retries
are worthwhile in mpath.

But I understand (and agree) that we'd be better off if mpath could
differentiate between failures rather than blindly retrying on failures
like it does today (fails path and retries if additional paths
available).

> Anyway, as you said, the flush error handling of dm-mpath is already
> broken if data loss really happens on any storage used by dm-mpath.
> Although it's a serious issue and quick fix is required, I think
> you may leave the old behavior in your patch-set, since it's
> a separate issue.

I'm not seeing where anything is broken with current mpath.  If a
multipathed LUN is WCE=1 then it should be fair to assume the cache is
mirrored or shared across ports.  Therefore retrying the SYNCHRONIZE
CACHE is needed.

Do we still have fear that SYNCHRONIZE CACHE can silently drop data?
Seems unlikely especially given what Tejun shared from SBC.

It seems that at worst, with current mpath, we retry when it doesn't
make sense (e.g. target failure).

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/