lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 31 Mar 2009 15:49:07 +0900
From:	Fernando Luis Vázquez Cao 
	<fernando@....ntt.co.jp>
To:	Jens Axboe <jens.axboe@...cle.com>
CC:	Jeff Garzik <jeff@...zik.org>,
	Christoph Hellwig <hch@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>, David Rees <drees76@...il.com>,
	Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	chris.mason@...cle.com, david@...morbit.com, tj@...nel.org
Subject: Re: [PATCH 5/7] vfs: Add  wbcflush sysfs knob to disable storage
 device writeback cache flushes

Jens Axboe wrote:
> On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
>> Jens Axboe wrote:
>>> On Mon, Mar 30 2009, Fernando Luis Vázquez Cao wrote:
>>>> Add a sysfs knob to disable storage device writeback cache flushes.
>>>>
>>>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@....ntt.co.jp>
>>>> ---
>>>>
>>>> diff -urNp linux-2.6.29-orig/block/blk-barrier.c linux-2.6.29/block/blk-barrier.c
>>>> --- linux-2.6.29-orig/block/blk-barrier.c	2009-03-24 08:12:14.000000000 +0900
>>>> +++ linux-2.6.29/block/blk-barrier.c	2009-03-30 17:08:28.000000000 +0900
>>>> @@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_devi
>>>>  	if (!q)
>>>>  		return -ENXIO;
>>>>
>>>> +	if (blk_queue_nowbcflush(q))
>>>> +		return -EOPNOTSUPP;
>>>> +
>>>>  	bio = bio_alloc(GFP_KERNEL, 0);
>>>>  	if (!bio)
>>>>  		return -ENOMEM;
>>>> diff -urNp linux-2.6.29-orig/block/blk-core.c linux-2.6.29/block/blk-core.c
>>>> --- linux-2.6.29-orig/block/blk-core.c	2009-03-24 08:12:14.000000000 +0900
>>>> +++ linux-2.6.29/block/blk-core.c	2009-03-30 17:08:28.000000000 +0900
>>>> @@ -1452,7 +1452,8 @@ static inline void __generic_make_reques
>>>>  			goto end_io;
>>>>  		}
>>>>  		if (bio_barrier(bio) && bio_has_data(bio) &&
>>>> -		    (q->next_ordered == QUEUE_ORDERED_NONE)) {
>>>> +		    (blk_queue_nowbcflush(q) ||
>>>> +		     q->next_ordered == QUEUE_ORDERED_NONE)) {
>>>>  			err = -EOPNOTSUPP;
>>>>  			goto end_io;
>>>>  		}
>>> This (and the above hunk) should be changed. -EOPNOTSUPP means the
>>> target does not support barriers, that is a different thing to flushes
>>> not being needed. A file system issuing a barrier and getting
>>> -EOPNOTSUPP back will disable barriers, since it now thinks that
>>> ordering cannot be guaranteed.
>> The reason I decided to use -EOPNOTSUPP was that I wanted to keep
>> barriers and device flushes from entering the block layer when
>> they are not needed. I feared that if we pass them down the block
>> stack (knowing in advance they will not be actually submitted to
>> disk) we may end up slowing things down unnecessarily.
> 
> But that's just wrong, you need to make sure that the block layer / io
> scheduler doesn't reorder as well. It's a lot more complex than just the
> device end. So just returning -EOPNOTSUPP and pretending that you need
> not use barriers at the fs end is just wrong.

I should have mentioned that in this patch set I was trying to tackle the
blkdev_issue_flush() case only. As you pointed out, with the code above
requests may get silently reordered across barriers inside the block layer.

The follow-up patch I am working on implements blkdev_issue_empty_barrier(),
which should be used by filesystems that want to emit an empty barrier (as
opposed to just triggering a device flush). Doing this we can optimize
fsync() flushes (block_flush_device()) and filesystem-originated barriers
(blkdev_issue_empty_barrier()) independently in the block layer.

I agree with you that the we should pass barriers down in
__generic_make_request, but the optimization above for fsync()-originated
blkdev_issue_flush()'s seems valid to me.

Does it make sense now?

>> As you mentioned, filesystems such as ext3/4 will disable
>> barriers if they get -EOPNOTSUPP when issuing one. I was planning
>> to add a notifier mechanism so that we can notify filesystems has
>> been a change in the barrier settings. This might be
>> over-engineering, though. Especially considering that "-o
>> remount,barrier=1" will bring us the barriers back.
> 
> I think that is over-engineering.

Yep, we certainly agree on that point.

>>> A more appropriate change would be to successfully complete a flush
>>> without actually sending it down to the device if blk_queue_nowbcflush()
>>> is true. Then blkdev_issue_flush() would just work as well. It also
>>> needs to take stacking into account, or stacked drivers will have to
>>> propagate the settings up the stack. If you allow simply the barrier to
>>> be passed down, you get that for free.
>> Aren't we risking slowing things down? Does the small optimization above
>> make sense (especially taking the remount trick into account)?
> 
> It's not, I think you are missing the bigger picture.

Sorry for not explaining myself properly. I will add a changelog and better
documentation for the patches.

Thank you for your feedback!

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ