[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49D20505.3060901@oss.ntt.co.jp>
Date: Tue, 31 Mar 2009 20:56:53 +0900
From: Fernando Luis Vázquez Cao
<fernando@....ntt.co.jp>
To: Jens Axboe <jens.axboe@...cle.com>
CC: Jeff Garzik <jeff@...zik.org>,
Christoph Hellwig <hch@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Nick Piggin <npiggin@...e.de>, David Rees <drees76@...il.com>,
Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
chris.mason@...cle.com, david@...morbit.com, tj@...nel.org
Subject: Re: [PATCH 5/7] vfs: Add wbcflush sysfs knob to disable storage
device writeback cache flushes
Jens Axboe wrote:
> On Tue, Mar 31 2009, Fernando Luis Vázquez Cao wrote:
>> Jens Axboe wrote:
>>> On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
>>>> Jens Axboe wrote:
>>>>> On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
>>>>>> Add a sysfs knob to disable storage device writeback cache flushes.
>>>>>>
>>>>>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@....ntt.co.jp>
>>>>>> ---
>>>>>>
>>>>>> diff -urNp linux-2.6.29-orig/block/blk-barrier.c linux-2.6.29/block/blk-barrier.c
>>>>>> --- linux-2.6.29-orig/block/blk-barrier.c 2009-03-24 08:12:14.000000000 +0900
>>>>>> +++ linux-2.6.29/block/blk-barrier.c 2009-03-30 17:08:28.000000000 +0900
>>>>>> @@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_devi
>>>>>> if (!q)
>>>>>> return -ENXIO;
>>>>>>
>>>>>> + if (blk_queue_nowbcflush(q))
>>>>>> + return -EOPNOTSUPP;
>>>>>> +
>>>>>> bio = bio_alloc(GFP_KERNEL, 0);
>>>>>> if (!bio)
>>>>>> return -ENOMEM;
>>>>>> diff -urNp linux-2.6.29-orig/block/blk-core.c linux-2.6.29/block/blk-core.c
>>>>>> --- linux-2.6.29-orig/block/blk-core.c 2009-03-24 08:12:14.000000000 +0900
>>>>>> +++ linux-2.6.29/block/blk-core.c 2009-03-30 17:08:28.000000000 +0900
>>>>>> @@ -1452,7 +1452,8 @@ static inline void __generic_make_reques
>>>>>> goto end_io;
>>>>>> }
>>>>>> if (bio_barrier(bio) && bio_has_data(bio) &&
>>>>>> - (q->next_ordered == QUEUE_ORDERED_NONE)) {
>>>>>> + (blk_queue_nowbcflush(q) ||
>>>>>> + q->next_ordered == QUEUE_ORDERED_NONE)) {
>>>>>> err = -EOPNOTSUPP;
>>>>>> goto end_io;
>>>>>> }
>>>>> This (and the above hunk) should be changed. -EOPNOTSUPP means the
>>>>> target does not support barriers, that is a different thing to flushes
>>>>> not being needed. A file system issuing a barrier and getting
>>>>> -EOPNOTSUPP back will disable barriers, since it now thinks that
>>>>> ordering cannot be guaranteed.
>>>> The reason I decided to use -EOPNOTSUPP was that I wanted to keep
>>>> barriers and device flushes from entering the block layer when
>>>> they are not needed. I feared that if we pass them down the block
>>>> stack (knowing in advance they will not be actually submitted to
>>>> disk) we may end up slowing things down unnecessarily.
>>> But that's just wrong, you need to make sure that the block layer / io
>>> scheduler doesn't reorder as well. It's a lot more complex than just the
>>> device end. So just returning -EOPNOTSUPP and pretending that you need
>>> not use barriers at the fs end is just wrong.
>> I should have mentioned that in this patch set I was trying to tackle the
>> blkdev_issue_flush() case only. As you pointed out, with the code above
>> requests may get silently reordered across barriers inside the block layer.
>>
>> The follow-up patch I am working on implements blkdev_issue_empty_barrier(),
>> which should be used by filesystems that want to emit an empty barrier (as
>> opposed to just triggering a device flush). Doing this we can optimize
>> fsync() flushes (block_flush_device()) and filesystem-originated barriers
>> (blkdev_issue_empty_barrier()) independently in the block layer.
>
> Not sure it makes sense to abstract that out into an api, it's basically
> just a bio_alloc(gfp, 0); with setting the bio fields and then
> submitting. Otherwise you'd have to either pass a ton of parameters, the
> caller will want to set end_io, bdev, etc anyway. And after that it's
> just submit_bio().
I will give it a try and see how it looks.
>> I agree with you that the we should pass barriers down in
>> __generic_make_request, but the optimization above for fsync()-originated
>> blkdev_issue_flush()'s seems valid to me.
>
> Of course, we need to do that. Anything else would be broken. The
> blkdev_issue_flush() should be changed to return 0, with the -EOPNOTSUPP
> being flag cached.
I am currently cooking a new iteration of these patches that do just
that. I will be reposting in a new thread and keep you all CCed.
- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists