linux-kernel - Re: [PATCH 1/7] block: Add block_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1238505376.8363.26.camel@think.oraclecorp.com>
Date:	Tue, 31 Mar 2009 09:16:16 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	Mark Lord <lkml@....ca>
Cc:	Jens Axboe <jens.axboe@...cle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Fernando Luis Vázquez Cao 
	<fernando@....ntt.co.jp>, Jeff Garzik <jeff@...zik.org>,
	Christoph Hellwig <hch@...radead.org>,
	Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>, David Rees <drees76@...il.com>,
	Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	david@...morbit.com, tj@...nel.org
Subject: Re: [PATCH 1/7] block: Add block_flush_device()

On Mon, 2009-03-30 at 16:52 -0400, Mark Lord wrote:
> Jens Axboe wrote:
> > On Mon, Mar 30 2009, Linus Torvalds wrote:
> >>
> >> On Mon, 30 Mar 2009, Jens Axboe wrote:
> >>> Sorry, I just don't see much point to doing it this way instead. So now
> >>> the fs will have to check a queue bit after it has issued the flush, how
> >>> is that any better than having the 'error' returned directly?
> >> No.
> >>
> >> Now the fs SHOULD NEVER CHECK AT ALL.
> >>
> >> Either it did the ordering, or the FS cannot do anything about it. 
> >>
> >> That's the point. EOPNOTSUPP is n ot a useful error message. You can't 
> >> _do_ anything about it.
> > 
> > My point is that some file systems may or may not have different paths
> > or optimizations depending on whether barriers are enabled and working
> > or not. Apparently that's just reiserfs and Chris says we can remove it,
> > so it is probably a moot point.
> ..
> 
> XFS appears to have something along those lines.
> I believe it tries to disable the drive write caches
> if it discovers that it cannot do cache flushes.
> 

If we get EOPNOTSUPP back from a submit_bh/submit_bio, the IO didn't
happen.  So, all the filesystems have code to try again without the
barrier flag, and then stop doing barriers from then on.

I'm not saying this is a good or bad API, just explaining for this one
example how it is being used today ;)

> I'll check next time my MythTV box boots up.
> It has a RAID0 under XFS, and the md raid0 code doesn't
> appear to pass the cache flushes to libata for raid0,
> so XFS complains and tries to turn off the write caches.
> 
>
> And I have a script to damn well turn them back ON again
> after it does so.  Stupid thing tries to override user policy again.
> 

XFS does print a warning about not doing barriers any more, but the
write cache should still be on.  Especially with MD in front of it, the
storage stack is pretty complex, a mounted filesystem would have a hard
time knowing where to start to turn off write caches on each drive in
the stack.

You can test this pretty easily:

dd if=/dev/zero of=foo bs=4k count=10000 oflag=direct

If that runs faster than 1MB/s the write cache is still on.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/