[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=jRZzrZO3_5ZQ+tt8qS=rYzrntLXrUoHq+8hnC@mail.gmail.com>
Date: Wed, 19 Jan 2011 10:05:12 +1100
From: Nick Piggin <npiggin@...il.com>
To: Jeff Moyer <jmoyer@...hat.com>
Cc: Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [patch] fs: aio fix rcu lookup
On Wed, Jan 19, 2011 at 10:00 AM, Jeff Moyer <jmoyer@...hat.com> wrote:
> Nick Piggin <npiggin@...il.com> writes:
>
>> On Wed, Jan 19, 2011 at 6:01 AM, Jan Kara <jack@...e.cz> wrote:
>>> Hi,
>>>
>>> On Tue 18-01-11 10:24:24, Nick Piggin wrote:
>>>> On Tue, Jan 18, 2011 at 6:07 AM, Jeff Moyer <jmoyer@...hat.com> wrote:
>>>> > Nick Piggin <npiggin@...il.com> writes:
>>>> >> Do you agree with the theoretical problem? I didn't try to
>>>> >> write a racer to break it yet. Inserting a delay before the
>>>> >> get_ioctx might do the trick.
>>>> >
>>>> > I'm not convinced, no. The last reference to the kioctx is always the
>>>> > process, released in the exit_aio path, or via sys_io_destroy. In both
>>>> > cases, we cancel all aios, then wait for them all to complete before
>>>> > dropping the final reference to the context.
>>>>
>>>> That wouldn't appear to prevent a concurrent thread from doing an
>>>> io operation that requires ioctx lookup, and taking the last reference
>>>> after the io_cancel thread drops the ref.
>>>>
>>>> > So, while I agree that what you wrote is better, I remain unconvinced of
>>>> > it solving a real-world problem. Feel free to push it in as a cleanup,
>>>> > though.
>>>>
>>>> Well I think it has to be technically correct first. If there is indeed a
>>>> guaranteed ref somehow, it just needs a comment.
>>> Hmm, the code in io_destroy() indeed looks fishy. We delete the ioctx
>>> from the hash table and set ioctx->dead which is supposed to stop
>>> lookup_ioctx() from finding it (see the !ctx->dead check in
>>> lookup_ioctx()). There's even a comment in io_destroy() saying:
>>> /*
>>> * Wake up any waiters. The setting of ctx->dead must be seen
>>> * by other CPUs at this point. Right now, we rely on the
>>> * locking done by the above calls to ensure this consistency.
>>> */
>>> But since lookup_ioctx() is called without any lock or barrier nothing
>>> really seems to prevent the list traversal and ioctx->dead test to happen
>>> before io_destroy() and get_ioctx() after io_destroy().
>>>
>>> But wouldn't the right fix be to call synchronize_rcu() in io_destroy()?
>>> Because with your fix we could still return 'dead' ioctx and I don't think
>>> we are supposed to do that...
>>
>> With my fix we won't oops, I was a bit concerned about ->dead,
>> yes but I don't know what semantics it is attempted to have there.
>>
>> synchronize_rcu() in io_destroy() does not prevent it from returning
>> as soon as lookup_ioctx drops the rcu_read_lock().
>>
>> The dead=1 in io_destroy indeed doesn't guarantee a whole lot.
>> Anyone know?
>
> See the comment above io_destroy for starters. Note that rcu was
> bolted on later, and I believe that ->dead has nothing to do with the
> rcu-ification.
Ah, yep, that makes sense indeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists