linux-kernel - Re: [PATCH] direct-io: only inc/dec inode->i_dio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <552EC547.8070606@fb.com>
Date:	Wed, 15 Apr 2015 14:08:39 -0600
From:	Jens Axboe <axboe@...com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	<linux-fsdevel@...r.kernel.org>, <hch@...radead.org>,
	<viro@...iv.linux.org.uk>, <linux-kernel@...r.kernel.org>,
	<clm@...com>
Subject: Re: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems

On 04/15/2015 01:46 PM, Andrew Morton wrote:
> On Wed, 15 Apr 2015 13:26:51 -0600 Jens Axboe <axboe@...com> wrote:
>
>>> Is there similar impact to direct-io-to-file?  It would be nice to fix
>>> that up also.  Many filesystems do something along the lines of
>>>
>>> 	atomic_inc(i_dio_count);
>>> 	wibble()
>>> 	atomic_dev(i_dio_count);
>>> 	__blockdev_direct_IO(...);
>>>
>>> and with your patch I think we could change them to
>>>
>>> 	atomic_inc(i_dio_count);
>>> 	wibble()
>>> 	__blockdev_direct_IO(..., flags|DIO_IGNORE_TRUNCATE);
>>> 	atomic_dev(i_dio_count);
>>>
>>> which would halve the atomic op load.
>>
>> I haven't checked pure file, but without extending, I suspect that we
>> should see similar benefits there. In any case, it'd make sense doing,
>> having twice the atomic inc/dec is just a bad idea in general, if we can
>> get rid of it.
>>
>> A quick grep doesn't show this use case, or I'm just blind. Where do you
>> see that?
>
> btrfs_direct_IO() holds i_dio_count across its call to
> __blockdev_direct_IO() for writes.  That makes the dio_bio_count
> manipulation in do_blockdev_direct_IO() unneeded?  ext4 is similar.

You are right, I even modified those.. Yes, that looks like it's 
unnecessary. Chris?

> Reducing from 4 ops to 2 probably won't make as much difference as
> reducing from 2 to 0 - most of the cost comes from initially grabbing that
> cacheline from a different CPU.

It wont be a 50% reduction, but it all really depends a lot on timing. 
If you have enough banging on it, it could be close to a 50% reduction.


>>> But that's piling hack on top of hack.  Can we change the
>>
>> I'd more view it as reducing the hack, the real hack is the way that we
>> manually do atomic_inc() on i_dio_count, and then call a magic
>> inode_dio_done() that decrements it again. It's not very pretty, I'm
>> just reducing the scope of the hack :-)
>
> A magic flag which says "you don't need to do this in that case because
> I know this inode is special".  direct-io already has too much of this :(

Well, outside of rewriting the dio code, that's what we get. The flags 
already exist, and I don't disagree with you that the situation is 
generally a mess.

>>> do_blockdev_direct_IO() interface to "caller shall hold i_mutex, or
>>> increment i_dio_count"?  ie: exclusion against truncate is wholly the
>>> caller's responsibility.  That way, this awkward sharing of
>>> responsibility between caller and callee gets cleaned up and
>>> DIO_IGNORE_TRUNCATE goes away.
>>
>> That would clean it up, at the expense of touching more churn. I'd be
>> fine with doing it that way.
>
> OK, could be done later I suppose.

It could easily be layered on top, doesn't have to be part of the 
initial patch. Once the flag is there, callers can do what they need and 
add the flag.

>>> inode_dio_begin() would be a good place to assert that i_mutex is held,
>>> btw.
>>
>> How would you check it? If the i_dio_count is bumped, then you'd not
>> need to hold i_mutex.
>
> 	if (atomic_add_return() == 1)
> 		assert()

That's not a 100% catch, but probably good enough.

> I guess.  It was just a thought.  Having wandered around the code, I'm
> not 100% confident that everyone is holding i_mutex - it's not all
> obviously correct.

Personally I'd just prefer not having to dive deeper into this for the 
benefit of this patch.

> otoh, the caller doesn't *have* to choose i_mutex for the external
> exclusion, and perhaps some callers have used something else.
>


-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/