[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <552EC547.8070606@fb.com>
Date: Wed, 15 Apr 2015 14:08:39 -0600
From: Jens Axboe <axboe@...com>
To: Andrew Morton <akpm@...ux-foundation.org>
CC: <linux-fsdevel@...r.kernel.org>, <hch@...radead.org>,
<viro@...iv.linux.org.uk>, <linux-kernel@...r.kernel.org>,
<clm@...com>
Subject: Re: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems
On 04/15/2015 01:46 PM, Andrew Morton wrote:
> On Wed, 15 Apr 2015 13:26:51 -0600 Jens Axboe <axboe@...com> wrote:
>
>>> Is there similar impact to direct-io-to-file? It would be nice to fix
>>> that up also. Many filesystems do something along the lines of
>>>
>>> atomic_inc(i_dio_count);
>>> wibble()
>>> atomic_dev(i_dio_count);
>>> __blockdev_direct_IO(...);
>>>
>>> and with your patch I think we could change them to
>>>
>>> atomic_inc(i_dio_count);
>>> wibble()
>>> __blockdev_direct_IO(..., flags|DIO_IGNORE_TRUNCATE);
>>> atomic_dev(i_dio_count);
>>>
>>> which would halve the atomic op load.
>>
>> I haven't checked pure file, but without extending, I suspect that we
>> should see similar benefits there. In any case, it'd make sense doing,
>> having twice the atomic inc/dec is just a bad idea in general, if we can
>> get rid of it.
>>
>> A quick grep doesn't show this use case, or I'm just blind. Where do you
>> see that?
>
> btrfs_direct_IO() holds i_dio_count across its call to
> __blockdev_direct_IO() for writes. That makes the dio_bio_count
> manipulation in do_blockdev_direct_IO() unneeded? ext4 is similar.
You are right, I even modified those.. Yes, that looks like it's
unnecessary. Chris?
> Reducing from 4 ops to 2 probably won't make as much difference as
> reducing from 2 to 0 - most of the cost comes from initially grabbing that
> cacheline from a different CPU.
It wont be a 50% reduction, but it all really depends a lot on timing.
If you have enough banging on it, it could be close to a 50% reduction.
>>> But that's piling hack on top of hack. Can we change the
>>
>> I'd more view it as reducing the hack, the real hack is the way that we
>> manually do atomic_inc() on i_dio_count, and then call a magic
>> inode_dio_done() that decrements it again. It's not very pretty, I'm
>> just reducing the scope of the hack :-)
>
> A magic flag which says "you don't need to do this in that case because
> I know this inode is special". direct-io already has too much of this :(
Well, outside of rewriting the dio code, that's what we get. The flags
already exist, and I don't disagree with you that the situation is
generally a mess.
>>> do_blockdev_direct_IO() interface to "caller shall hold i_mutex, or
>>> increment i_dio_count"? ie: exclusion against truncate is wholly the
>>> caller's responsibility. That way, this awkward sharing of
>>> responsibility between caller and callee gets cleaned up and
>>> DIO_IGNORE_TRUNCATE goes away.
>>
>> That would clean it up, at the expense of touching more churn. I'd be
>> fine with doing it that way.
>
> OK, could be done later I suppose.
It could easily be layered on top, doesn't have to be part of the
initial patch. Once the flag is there, callers can do what they need and
add the flag.
>>> inode_dio_begin() would be a good place to assert that i_mutex is held,
>>> btw.
>>
>> How would you check it? If the i_dio_count is bumped, then you'd not
>> need to hold i_mutex.
>
> if (atomic_add_return() == 1)
> assert()
That's not a 100% catch, but probably good enough.
> I guess. It was just a thought. Having wandered around the code, I'm
> not 100% confident that everyone is holding i_mutex - it's not all
> obviously correct.
Personally I'd just prefer not having to dive deeper into this for the
benefit of this patch.
> otoh, the caller doesn't *have* to choose i_mutex for the external
> exclusion, and perhaps some callers have used something else.
>
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists