linux-kernel - Re: [patch] eventfd - remove fput() call from possible IRQ context (2nd rev)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49C10B6B.3040108@cosmosbay.com>
Date:	Wed, 18 Mar 2009 15:55:39 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Jeff Moyer <jmoyer@...hat.com>
CC:	Davide Libenzi <davidel@...ilserver.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Benjamin LaHaise <bcrl@...ck.org>,
	Trond Myklebust <trond.myklebust@....uio.no>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-aio <linux-aio@...ck.org>, zach.brown@...cle.com
Subject: Re: [patch] eventfd - remove fput() call from possible IRQ context
 (2nd rev)

Jeff Moyer a écrit :
> Davide Libenzi <davidel@...ilserver.org> writes:
> 
>> The following patch remove a possible source of fput() call from inside 
>> IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call 
>> from IRQ context, but conceptually the bug is there.
> 
> I've attached a test program which can reproduce the fput call in
> interrupt context.  It's a modified version of the eventfd test that
> Rusty wrote for the libaio test harness.  I verified that fput was in
> fact being called in interrupt context by using systemtap to print out
> the "thead_indent" of fput calls, and observing a "swapper(0)" in the
> output.  After applying your patch, I confirmed that __fput is no longer
> called from interrupt context.  Strangely enough, I never did get any
> output from the might_sleep in __fput.  I can't explain that.
> 
> I have some minor comments inlined below.
> 
>> This patch adds an optimization similar to the one we already do on 
>> ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty 
>> in general, but the alternative here would be to add a brand new delayed 
>> fput() infrastructure, that I'm not sure is worth it.
>>
>> On Sun, 15 Mar 2009, Benjamin LaHaise wrote:
>>
>>> This looks reasonably sane, the only concern I have with it is that I think 
>>> it logically makes more sense to use the same convention for fi_filp and 
>>> ki_eventfd, as the different in IS_ERR vs checking for NULL is a bit 
>>> confusing.  Aside from that, it looks like it should fix the problem 
>>> correctly.
>> Makes sense.
>>
>> Signed-off-by: Davide Libenzi <davidel@...ilserver.org>
>>
>>
>> - Davide
>>
>>
>> ---
>>  fs/aio.c |   37 +++++++++++++++++++++++++++----------
>>  1 file changed, 27 insertions(+), 10 deletions(-)
>>
>> Index: linux-2.6.mod/fs/aio.c
>> ===================================================================
>> --- linux-2.6.mod.orig/fs/aio.c	2009-03-14 09:24:12.000000000 -0700
>> +++ linux-2.6.mod/fs/aio.c	2009-03-15 12:54:10.000000000 -0700
> ...
>> @@ -527,12 +528,14 @@ static void aio_fput_routine(struct work
>>   */
>>  static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
>>  {
>> +	int schedule_putreq = 0;
>> +
>>  	dprintk(KERN_DEBUG "aio_put(%p): f_count=%ld\n",
>>  		req, atomic_long_read(&req->ki_filp->f_count));
>>  
>>  	assert_spin_locked(&ctx->ctx_lock);
>>  
>> -	req->ki_users --;
>> +	req->ki_users--;
>>  	BUG_ON(req->ki_users < 0);
>>  	if (likely(req->ki_users))
>>  		return 0;
>> @@ -540,10 +543,23 @@ static int __aio_put_req(struct kioctx *
>>  	req->ki_cancel = NULL;
>>  	req->ki_retry = NULL;
>>  
>> -	/* Must be done under the lock to serialise against cancellation.
>> -	 * Call this aio_fput as it duplicates fput via the fput_work.
>> +	/*
>> +	 * Try to optimize the aio and eventfd file* puts, by avoiding to
>> +	 * schedule work in case it is not __fput() time. In normal cases,
>> +	 * we wouldn not be holding the last reference to the file*, so
>               ^^^^^^^^^^
> tyop
> 
>> +	 * this function will be executed w/out any aio kthread wakeup.
>>  	 */
>> -	if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count))) {
>> +	if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count)))
>> +		schedule_putreq++;
>> +	else
>> +		req->ki_filp = NULL;
>> +	if (unlikely(req->ki_eventfd != NULL)) {
>> +		if (unlikely(atomic_long_dec_and_test(&req->ki_eventfd->f_count)))
>> +			schedule_putreq++;
>> +		else
>> +			req->ki_eventfd = NULL;
>> +	}
> 
> I agree with Jamie that you should get rid of the unlikely.
> 
> Thanks for taking care of this, Davide.
> 
> Signed-off-by: Jeff Moyer <jmoyer@...hat.com>
> 
> Cheers,
> Jeff
> 
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <string.h>
> #include <sys/types.h>
> #include <errno.h>
> #include <assert.h>
> #include <sys/eventfd.h>
> #include <libaio.h>
> 
> int
> main(int argc, char **argv)
> {
> #define SIZE	(256*1024*1024)
> 	char *buf;
> 	struct io_event io_event;
> 	struct iocb iocb;
> 	struct iocb *iocbs[] = { &iocb };
> 	int rwfd, efd;
> 	int res;
> 	io_context_t	io_ctx;
> 
> 	efd = eventfd(0, 0);
> 	if (efd < 0) {
> 		perror("eventfd");
> 		exit(1);
> 	}
> 
> 	rwfd = open("rwfile", O_RDWR|O_DIRECT);		assert(rwfd != -1);
> 	if (posix_memalign((void **)&buf, getpagesize(), SIZE) < 0) {
> 		perror("posix_memalign");
> 		exit(1);
> 	}
> 	memset(buf, 0x42, SIZE);
> 
> 	/* Write test. */
> 	res = io_queue_init(1024, &io_ctx);		assert(res == 0);
> 	io_prep_pwrite(&iocb, rwfd, buf, SIZE, 0);
> 	io_set_eventfd(&iocb, efd);
> 	res = io_submit(io_ctx, 1, iocbs);		assert(res == 1);

yes but io_submit() is blocking. so your close(efd) will come after the release in fs/aio.c

I suggest you start a thread just before io_submit() and give it this work :

void *thread_work(void *arg)
{
	usleep(10000);
	close(efd);
	return (void *)0;
}

> 
> 	/* Now close the eventfd so that AIO has the last reference */
> 	close(efd);
> 
> 	/* Keep this process around so that the aio subsystem does not hold
> 	 * the last reference on the rwfd, otherwise the really_put_req will
> 	 * be called from process context */
> 	res = io_getevents(io_ctx, 1, 1, &io_event, NULL);
> 	if (res != 1) {
> 		if (res < 0) {
> 			errno = -res;
> 			perror("io_getevents");
> 		} else
> 			printf("io_getevents did not return 1 event after "
> 			       "closing eventfd\n");
> 		exit(1);
> 	}
> 	assert(io_event.res == SIZE);
> 	printf("eventfd write test [SUCCESS]\n");
> 
> 	return 0;
> }
> /*
>  * Local variables:
>  *   c-basic-offset: 8
>  *   compile-command: "gcc -o eventfd-in-irq eventfd-in-irq.c -laio -g3"
>  * End:
>  */
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/