[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1238505376.8363.26.camel@think.oraclecorp.com>
Date: Tue, 31 Mar 2009 09:16:16 -0400
From: Chris Mason <chris.mason@...cle.com>
To: Mark Lord <lkml@....ca>
Cc: Jens Axboe <jens.axboe@...cle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Fernando Luis Vázquez Cao
<fernando@....ntt.co.jp>, Jeff Garzik <jeff@...zik.org>,
Christoph Hellwig <hch@...radead.org>,
Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Nick Piggin <npiggin@...e.de>, David Rees <drees76@...il.com>,
Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
david@...morbit.com, tj@...nel.org
Subject: Re: [PATCH 1/7] block: Add block_flush_device()
On Mon, 2009-03-30 at 16:52 -0400, Mark Lord wrote:
> Jens Axboe wrote:
> > On Mon, Mar 30 2009, Linus Torvalds wrote:
> >>
> >> On Mon, 30 Mar 2009, Jens Axboe wrote:
> >>> Sorry, I just don't see much point to doing it this way instead. So now
> >>> the fs will have to check a queue bit after it has issued the flush, how
> >>> is that any better than having the 'error' returned directly?
> >> No.
> >>
> >> Now the fs SHOULD NEVER CHECK AT ALL.
> >>
> >> Either it did the ordering, or the FS cannot do anything about it.
> >>
> >> That's the point. EOPNOTSUPP is n ot a useful error message. You can't
> >> _do_ anything about it.
> >
> > My point is that some file systems may or may not have different paths
> > or optimizations depending on whether barriers are enabled and working
> > or not. Apparently that's just reiserfs and Chris says we can remove it,
> > so it is probably a moot point.
> ..
>
> XFS appears to have something along those lines.
> I believe it tries to disable the drive write caches
> if it discovers that it cannot do cache flushes.
>
If we get EOPNOTSUPP back from a submit_bh/submit_bio, the IO didn't
happen. So, all the filesystems have code to try again without the
barrier flag, and then stop doing barriers from then on.
I'm not saying this is a good or bad API, just explaining for this one
example how it is being used today ;)
> I'll check next time my MythTV box boots up.
> It has a RAID0 under XFS, and the md raid0 code doesn't
> appear to pass the cache flushes to libata for raid0,
> so XFS complains and tries to turn off the write caches.
>
>
> And I have a script to damn well turn them back ON again
> after it does so. Stupid thing tries to override user policy again.
>
XFS does print a warning about not doing barriers any more, but the
write cache should still be on. Especially with MD in front of it, the
storage stack is pretty complex, a mounted filesystem would have a hard
time knowing where to start to turn off write caches on each drive in
the stack.
You can test this pretty easily:
dd if=/dev/zero of=foo bs=4k count=10000 oflag=direct
If that runs faster than 1MB/s the write cache is still on.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists