[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110118235236.GA14087@quack.suse.cz>
Date: Wed, 19 Jan 2011 00:52:36 +0100
From: Jan Kara <jack@...e.cz>
To: Nick Piggin <npiggin@...il.com>
Cc: Jan Kara <jack@...e.cz>, Jeff Moyer <jmoyer@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [patch] fs: aio fix rcu lookup
On Wed 19-01-11 09:17:23, Nick Piggin wrote:
> On Wed, Jan 19, 2011 at 6:01 AM, Jan Kara <jack@...e.cz> wrote:
> > On Tue 18-01-11 10:24:24, Nick Piggin wrote:
> >> On Tue, Jan 18, 2011 at 6:07 AM, Jeff Moyer <jmoyer@...hat.com> wrote:
> >> > Nick Piggin <npiggin@...il.com> writes:
> >> >> Do you agree with the theoretical problem? I didn't try to
> >> >> write a racer to break it yet. Inserting a delay before the
> >> >> get_ioctx might do the trick.
> >> >
> >> > I'm not convinced, no. The last reference to the kioctx is always the
> >> > process, released in the exit_aio path, or via sys_io_destroy. In both
> >> > cases, we cancel all aios, then wait for them all to complete before
> >> > dropping the final reference to the context.
> >>
> >> That wouldn't appear to prevent a concurrent thread from doing an
> >> io operation that requires ioctx lookup, and taking the last reference
> >> after the io_cancel thread drops the ref.
> >>
> >> > So, while I agree that what you wrote is better, I remain unconvinced of
> >> > it solving a real-world problem. Feel free to push it in as a cleanup,
> >> > though.
> >>
> >> Well I think it has to be technically correct first. If there is indeed a
> >> guaranteed ref somehow, it just needs a comment.
> > Hmm, the code in io_destroy() indeed looks fishy. We delete the ioctx
> > from the hash table and set ioctx->dead which is supposed to stop
> > lookup_ioctx() from finding it (see the !ctx->dead check in
> > lookup_ioctx()). There's even a comment in io_destroy() saying:
> > /*
> > * Wake up any waiters. The setting of ctx->dead must be seen
> > * by other CPUs at this point. Right now, we rely on the
> > * locking done by the above calls to ensure this consistency.
> > */
> > But since lookup_ioctx() is called without any lock or barrier nothing
> > really seems to prevent the list traversal and ioctx->dead test to happen
> > before io_destroy() and get_ioctx() after io_destroy().
> >
> > But wouldn't the right fix be to call synchronize_rcu() in io_destroy()?
> > Because with your fix we could still return 'dead' ioctx and I don't think
> > we are supposed to do that...
>
> With my fix we won't oops, I was a bit concerned about ->dead,
> yes but I don't know what semantics it is attempted to have there.
But wouldn't it do something bad if the memory gets reallocated for
something else and set to non-zero? E.g. memory corruption?
> synchronize_rcu() in io_destroy() does not prevent it from returning
> as soon as lookup_ioctx drops the rcu_read_lock().
Yes, exactly. So references obtained before synchronize_rcu() would be
completely fine and valid references and there would be no references after
synchronize_rcu() because they'd see 'dead' set. But looking at the code
again it still would not be enough because we could still race with
io_submit_one() adding new IO to the ioctx which will be freed just after
put_ioctx() in do_io_submit().
The patch below implements what I have in mind - it should be probably
split into two but I'd like to hear comments on that before doing these
cosmetic touches ;)
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
---
>From 7a662553a44db06381d405a76426855b5507c61d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@...e.cz>
Date: Wed, 19 Jan 2011 00:37:48 +0100
Subject: [PATCH] fs: Fix ioctx lookup races with io_destroy() in AIO
AIO code doesn't implement the rcu lookup pattern properly.
rcu_read_lock in lookup_ioctx() does not prevent refcount
going to zero, so we might take a refcount on a zero count
(possibly already freed) ioctx:
CPU1 CPU2
lookup_ioctx()
hlist_for_each_entry_rcu()
if (ctx->user_id == ctx_id && !ctx->dead)
io_destroy()
free ioctx
get_ioctx(ctx);
and return freed memory to innocent caller
Close this race by calling synchronize_rcu() in io_destroy().
Another race occurs when io_submit() races with io_destroy():
CPU1 CPU2
io_submit()
do_io_submit()
...
ctx = lookup_ioctx(ctx_id);
io_destroy()
Now do_io_submit() holds the last reference to ctx.
...
queue new AIO
put_ioctx(ctx) - frees ctx with active AIOs
We solve this issue by checking whether ctx is being destroyed
in AIO submission path after adding new AIO to ctx. Then we
are guaranteed that either io_destroy() waits for new AIO or
we see that ctx is being destroyed and bail out.
Signed-off-by: Jan Kara <jack@...e.cz>
---
fs/aio.c | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 8c8f6c5..59ac692 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1225,6 +1225,17 @@ static void io_destroy(struct kioctx *ioctx)
if (likely(!was_dead))
put_ioctx(ioctx); /* twice for the list */
+ /*
+ * After this ioctx cannot be looked up because ioctx->dead is set.
+ * But there could be still io_submit() running...
+ */
+ synchronize_rcu();
+
+ /*
+ * ... but once these functions finish, the io has been either
+ * cancelled (if aio_get_req() has already run) or the ctx->dead
+ * check in io_submit_one() triggers and no io is submitted.
+ */
aio_cancel_all(ioctx);
wait_for_all_aios(ioctx);
@@ -1646,6 +1657,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
goto out_put_req;
spin_lock_irq(&ctx->ctx_lock);
+ /*
+ * Raced with io_destroy()? See comments in io_destroy() for details.
+ * The check is inside ctx->ctx_lock to avoid extra memory barrier
+ * in this fast path...
+ */
+ if (ctx->dead) {
+ spin_unlock_irq(&ctx->ctx_lock);
+ ret = -EINVAL;
+ goto out_put_req;
+ }
aio_run_iocb(req);
if (!list_empty(&ctx->run_list)) {
/* drain the run list */
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists