[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x49wsamhpvg.fsf@segfault.boston.devel.redhat.com>
Date: Wed, 18 Mar 2009 10:22:59 -0400
From: Jeff Moyer <jmoyer@...hat.com>
To: Davide Libenzi <davidel@...ilserver.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Benjamin LaHaise <bcrl@...ck.org>,
Trond Myklebust <trond.myklebust@....uio.no>,
Andrew Morton <akpm@...ux-foundation.org>,
Eric Dumazet <dada1@...mosbay.com>,
linux-aio <linux-aio@...ck.org>, zach.brown@...cle.com
Subject: Re: [patch] eventfd - remove fput() call from possible IRQ context (2nd rev)
Davide Libenzi <davidel@...ilserver.org> writes:
> The following patch remove a possible source of fput() call from inside
> IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call
> from IRQ context, but conceptually the bug is there.
I've attached a test program which can reproduce the fput call in
interrupt context. It's a modified version of the eventfd test that
Rusty wrote for the libaio test harness. I verified that fput was in
fact being called in interrupt context by using systemtap to print out
the "thead_indent" of fput calls, and observing a "swapper(0)" in the
output. After applying your patch, I confirmed that __fput is no longer
called from interrupt context. Strangely enough, I never did get any
output from the might_sleep in __fput. I can't explain that.
I have some minor comments inlined below.
> This patch adds an optimization similar to the one we already do on
> ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty
> in general, but the alternative here would be to add a brand new delayed
> fput() infrastructure, that I'm not sure is worth it.
>
> On Sun, 15 Mar 2009, Benjamin LaHaise wrote:
>
>> This looks reasonably sane, the only concern I have with it is that I think
>> it logically makes more sense to use the same convention for fi_filp and
>> ki_eventfd, as the different in IS_ERR vs checking for NULL is a bit
>> confusing. Aside from that, it looks like it should fix the problem
>> correctly.
>
> Makes sense.
>
> Signed-off-by: Davide Libenzi <davidel@...ilserver.org>
>
>
> - Davide
>
>
> ---
> fs/aio.c | 37 +++++++++++++++++++++++++++----------
> 1 file changed, 27 insertions(+), 10 deletions(-)
>
> Index: linux-2.6.mod/fs/aio.c
> ===================================================================
> --- linux-2.6.mod.orig/fs/aio.c 2009-03-14 09:24:12.000000000 -0700
> +++ linux-2.6.mod/fs/aio.c 2009-03-15 12:54:10.000000000 -0700
...
> @@ -527,12 +528,14 @@ static void aio_fput_routine(struct work
> */
> static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
> {
> + int schedule_putreq = 0;
> +
> dprintk(KERN_DEBUG "aio_put(%p): f_count=%ld\n",
> req, atomic_long_read(&req->ki_filp->f_count));
>
> assert_spin_locked(&ctx->ctx_lock);
>
> - req->ki_users --;
> + req->ki_users--;
> BUG_ON(req->ki_users < 0);
> if (likely(req->ki_users))
> return 0;
> @@ -540,10 +543,23 @@ static int __aio_put_req(struct kioctx *
> req->ki_cancel = NULL;
> req->ki_retry = NULL;
>
> - /* Must be done under the lock to serialise against cancellation.
> - * Call this aio_fput as it duplicates fput via the fput_work.
> + /*
> + * Try to optimize the aio and eventfd file* puts, by avoiding to
> + * schedule work in case it is not __fput() time. In normal cases,
> + * we wouldn not be holding the last reference to the file*, so
^^^^^^^^^^
tyop
> + * this function will be executed w/out any aio kthread wakeup.
> */
> - if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count))) {
> + if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count)))
> + schedule_putreq++;
> + else
> + req->ki_filp = NULL;
> + if (unlikely(req->ki_eventfd != NULL)) {
> + if (unlikely(atomic_long_dec_and_test(&req->ki_eventfd->f_count)))
> + schedule_putreq++;
> + else
> + req->ki_eventfd = NULL;
> + }
I agree with Jamie that you should get rid of the unlikely.
Thanks for taking care of this, Davide.
Signed-off-by: Jeff Moyer <jmoyer@...hat.com>
Cheers,
Jeff
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <sys/types.h>
#include <errno.h>
#include <assert.h>
#include <sys/eventfd.h>
#include <libaio.h>
int
main(int argc, char **argv)
{
#define SIZE (256*1024*1024)
char *buf;
struct io_event io_event;
struct iocb iocb;
struct iocb *iocbs[] = { &iocb };
int rwfd, efd;
int res;
io_context_t io_ctx;
efd = eventfd(0, 0);
if (efd < 0) {
perror("eventfd");
exit(1);
}
rwfd = open("rwfile", O_RDWR|O_DIRECT); assert(rwfd != -1);
if (posix_memalign((void **)&buf, getpagesize(), SIZE) < 0) {
perror("posix_memalign");
exit(1);
}
memset(buf, 0x42, SIZE);
/* Write test. */
res = io_queue_init(1024, &io_ctx); assert(res == 0);
io_prep_pwrite(&iocb, rwfd, buf, SIZE, 0);
io_set_eventfd(&iocb, efd);
res = io_submit(io_ctx, 1, iocbs); assert(res == 1);
/* Now close the eventfd so that AIO has the last reference */
close(efd);
/* Keep this process around so that the aio subsystem does not hold
* the last reference on the rwfd, otherwise the really_put_req will
* be called from process context */
res = io_getevents(io_ctx, 1, 1, &io_event, NULL);
if (res != 1) {
if (res < 0) {
errno = -res;
perror("io_getevents");
} else
printf("io_getevents did not return 1 event after "
"closing eventfd\n");
exit(1);
}
assert(io_event.res == SIZE);
printf("eventfd write test [SUCCESS]\n");
return 0;
}
/*
* Local variables:
* c-basic-offset: 8
* compile-command: "gcc -o eventfd-in-irq eventfd-in-irq.c -laio -g3"
* End:
*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists