[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49D11A8D.1090600@garzik.org>
Date: Mon, 30 Mar 2009 15:16:29 -0400
From: Jeff Garzik <jeff@...zik.org>
To: Jens Axboe <jens.axboe@...cle.com>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
Fernando Luis Vázquez Cao
<fernando@....ntt.co.jp>, Christoph Hellwig <hch@...radead.org>,
Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Nick Piggin <npiggin@...e.de>, David Rees <drees76@...il.com>,
Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
chris.mason@...cle.com, david@...morbit.com, tj@...nel.org
Subject: Re: [PATCH 1/7] block: Add block_flush_device()
Jens Axboe wrote:
> On Mon, Mar 30 2009, Linus Torvalds wrote:
>>
>> On Mon, 30 Mar 2009, Jens Axboe wrote:
>>> The problem is that we may not know upfront, so it sort-of has to be
>>> this trial approach where the first barrier issued will notice and fail
>>> with -EOPNOTSUPP.
>> Well, absolutely. Except I don't think you shoul use ENOTSUPP, you should
>> just set a bit in the "struct request_queue", and then return 0.
>>
>> IOW, something like this
>>
>> --- a/block/blk-barrier.c
>> +++ b/block/blk-barrier.c
>> @@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
>> if (!q)
>> return -ENXIO;
>>
>> + if (is_queue_noflush(q))
>> + return 0;
>> +
>> bio = bio_alloc(GFP_KERNEL, 0);
>> if (!bio)
>> return -ENOMEM;
>> @@ -339,7 +342,7 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
>>
>> ret = 0;
>> if (bio_flagged(bio, BIO_EOPNOTSUPP))
>> - ret = -EOPNOTSUPP;
>> + set_queue_noflush(q);
>> else if (!bio_flagged(bio, BIO_UPTODATE))
>> ret = -EIO;
>>
>>
>> which just returns 0 if we don't support flushing on that queue.
>>
>> (Obviously incomplete patch, which is why I also intentionally
>> whitespace-broke it).
>>
>>> Sure, we could cache this value, but it's pretty
>>> pointless since the filesystem will stop sending barriers in this case.
>> Well no, it won't. Or rather, it will have to have such a stupid
>> per-filesystem flag, for no good reason.
>
> Sorry, I just don't see much point to doing it this way instead. So now
> the fs will have to check a queue bit after it has issued the flush, how
> is that any better than having the 'error' returned directly?
AFAICS, the aim is simply to return zero rather than EOPNOTSUPP, for the
not-supported case, rather than burdening all callers with such checks.
Which is quite reasonable for Fernando's patch -- the direct call fsync
case.
But that leaves open the possibility that some people really do want the
EOPNOTSUPP return value, I guess? Do existing callers need that?
>>> For blkdev_issue_flush() it may not be very interesting, since there's
>>> not much we can do about that. Just seems like very bad style to NOT
>>> return an error in such a case. You can assume that ordering is fine,
>>> but it definitely wont be in all case (eg devices that have write back
>>> caching on by default and don't support flush).
>> So?
>>
>> The thing is, you can't _do_ anything about it. So what's the point in
>> returning an error? The caller cannot possibly care - because there is
>> nothing the caller can really do.
>
> Not for blkdev_issue_flush(), all they can do is report about the
> device. And even that would be a vague "Your data may or may not be
> safe, we don't know".
>
>> Sure, the device may or may not re-order things, but since the caller
>> can't know, and can't really do a thing about it _anyway_, you're just
>> better off not even confusing anybody.
>
> I'd call that a pretty reckless approach to data integrity, honestly.
> You HAVE to issue an error in this case. Then the user/admin can at least
> check up on the device stack in question, and determine whether this is
> an issue or not. That goes for both blkdev_issue_flush() and the actual
> barrier write. And perhaps the cached value is then of some use, since
> you then know when to warn (bit not already set) and you can keep the
> warning in blkdev_issue_flush() instead of putting it in every call
> site.
Indeed -- if the drive tells us it failed the cache flush, it seems
self-evident that we should be passing that failure back to userspace
where possible.
And as the patches show, it is definitely possible to return a FLUSH
CACHE error back to an fsync(2) caller [though, yes, I certainly
recognize fsync is not the only generator of these requests].
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists