[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <0DF58573-AC11-4732-B48C-76401C1A222D@oracle.com>
Date: Tue, 2 Jan 2007 16:49:21 -0800
From: Zach Brown <zach.brown@...cle.com>
To: "Chen, Kenneth W" <kenneth.w.chen@...el.com>
Cc: "'Andrew Morton'" <akpm@...l.org>, <linux-aio@...ck.org>,
<linux-kernel@...r.kernel.org>,
"'Benjamin LaHaise'" <bcrl@...ck.org>, <suparna@...ibm.com>
Subject: Re: [patch] aio: add per task aio wait event condition
On Dec 29, 2006, at 6:31 PM, Chen, Kenneth W wrote:
> The AIO wake-up notification from aio_complete is really inefficient
> in current AIO implementation in the presence of process waiting in
> io_getevents().
Yeah, it's a real deficiency. Thanks for taking a stab at it.
> This patch adds a wait condition to the wait queue and only wake-up
> process when that condition meets. And this condition is added on a
> per task base for handling multi-threaded app that shares single
> ioctx.
But only one of the waiting tasks is tested, the one at the head of
the list. It looks like this change could starve a io_getevents()
with a low min_nr in the presence of another io_getevents() with a
larger min_nr.
> Before:
> 0 0 0 3972608 7056 31312 0 0 14100 0 7885
> 13747 0 2 98 0
> After:
> 0 0 0 3972608 7056 31312 0 0 13800 0 7885
> 42 0 2 98 0
Nice. What min_nr was used in this test?
> +struct aio_wait_queue {
> + int nr_wait; /* wake-up condition */
It appears that this is never assigned a negative? Can we make it
that explicit in the type so that we reviewers don't have to worry
about wrapping and signed comparisons?
> - DECLARE_WAITQUEUE(wait, tsk);
> + struct aio_wait_queue wait;
> + aio_init_wait(&wait);
This just changed from using default_wake_function() to
autoremove_wait_function(). Very sneaky! wait_for_all_aios() should
be adding the wait queue before going to sleep each time. (better
still to just use wait_event()).
Was this on purpose? I'm all for it as a way to reduce wakeups from
a stream of completions to a single waiter.
> + nr_evt = ring->tail - ring->head;
> + if (nr_evt < 0)
> + nr_evt += info->nr;
int = unsigned - unsigned;
if (int < 0)
My head already hurts. Can we clean this up so one doesn't have to
live and breath type conversion rules to tell if this code is correct?
> + if (waitqueue_active(&ctx->wait)) {
> + struct aio_wait_queue *wait;
> + wait = container_of(ctx->wait.task_list.next,
> + struct aio_wait_queue, wait.task_list);
> + if (nr_evt >= wait->nr_wait)
> + wake_up(&ctx->wait);
> + }
First is the fear of starvation as mentioned previously.
issue 2 ops
first io_getevents sleeps with a min_nr of 2
second io_getevents sleeps with min_nr of 3
2 ops complete but only test the second sleeper's min_nr of 3
first sleeper twiddles thumbs
This makes me think this elegant task_list approach is doomed. I
think this is what stopped Ben and I from being interested in this
last time we talked about it :).
Also, is that container_of() and dereference safe in the presence of
racing wake-ups? It looks like we could get deref a freed wait and
get a bogus nr_wait and decide not to wake.
Andrew, I fear we should remove this from -mm until it's fixed up.
- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists