[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090330143539.GT5178@kernel.dk>
Date: Mon, 30 Mar 2009 16:35:39 +0200
From: Jens Axboe <jens.axboe@...cle.com>
To: Fernando Luis Vázquez Cao
<fernando@....ntt.co.jp>
Cc: Jeff Garzik <jeff@...zik.org>,
Christoph Hellwig <hch@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Nick Piggin <npiggin@...e.de>, David Rees <drees76@...il.com>,
Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
chris.mason@...cle.com, david@...morbit.com, tj@...nel.org
Subject: Re: [PATCH 5/7] vfs: Add wbcflush sysfs knob to disable storage
device writeback cache flushes
On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
> Jens Axboe wrote:
>> On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
>>> Add a sysfs knob to disable storage device writeback cache flushes.
>>>
>>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@....ntt.co.jp>
>>> ---
>>>
>>> diff -urNp linux-2.6.29-orig/block/blk-barrier.c linux-2.6.29/block/blk-barrier.c
>>> --- linux-2.6.29-orig/block/blk-barrier.c 2009-03-24 08:12:14.000000000 +0900
>>> +++ linux-2.6.29/block/blk-barrier.c 2009-03-30 17:08:28.000000000 +0900
>>> @@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_devi
>>> if (!q)
>>> return -ENXIO;
>>>
>>> + if (blk_queue_nowbcflush(q))
>>> + return -EOPNOTSUPP;
>>> +
>>> bio = bio_alloc(GFP_KERNEL, 0);
>>> if (!bio)
>>> return -ENOMEM;
>>> diff -urNp linux-2.6.29-orig/block/blk-core.c linux-2.6.29/block/blk-core.c
>>> --- linux-2.6.29-orig/block/blk-core.c 2009-03-24 08:12:14.000000000 +0900
>>> +++ linux-2.6.29/block/blk-core.c 2009-03-30 17:08:28.000000000 +0900
>>> @@ -1452,7 +1452,8 @@ static inline void __generic_make_reques
>>> goto end_io;
>>> }
>>> if (bio_barrier(bio) && bio_has_data(bio) &&
>>> - (q->next_ordered == QUEUE_ORDERED_NONE)) {
>>> + (blk_queue_nowbcflush(q) ||
>>> + q->next_ordered == QUEUE_ORDERED_NONE)) {
>>> err = -EOPNOTSUPP;
>>> goto end_io;
>>> }
>>
>> This (and the above hunk) should be changed. -EOPNOTSUPP means the
>> target does not support barriers, that is a different thing to flushes
>> not being needed. A file system issuing a barrier and getting
>> -EOPNOTSUPP back will disable barriers, since it now thinks that
>> ordering cannot be guaranteed.
>
> The reason I decided to use -EOPNOTSUPP was that I wanted to keep
> barriers and device flushes from entering the block layer when
> they are not needed. I feared that if we pass them down the block
> stack (knowing in advance they will not be actually submitted to
> disk) we may end up slowing things down unnecessarily.
But that's just wrong, you need to make sure that the block layer / io
scheduler doesn't reorder as well. It's a lot more complex than just the
device end. So just returning -EOPNOTSUPP and pretending that you need
not use barriers at the fs end is just wrong.
> As you mentioned, filesystems such as ext3/4 will disable
> barriers if they get -EOPNOTSUPP when issuing one. I was planning
> to add a notifier mechanism so that we can notify filesystems has
> been a change in the barrier settings. This might be
> over-engineering, though. Especially considering that "-o
> remount,barrier=1" will bring us the barriers back.
I think that is over-engineering.
>> A more appropriate change would be to successfully complete a flush
>> without actually sending it down to the device if blk_queue_nowbcflush()
>> is true. Then blkdev_issue_flush() would just work as well. It also
>> needs to take stacking into account, or stacked drivers will have to
>> propagate the settings up the stack. If you allow simply the barrier to
>> be passed down, you get that for free.
>
> Aren't we risking slowing things down? Does the small optimization above
> make sense (especially taking the remount trick into account)?
It's not, I think you are missing the bigger picture.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists