linux-kernel - Re: [PATCH] eventfd signal race in aio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 7 Mar 2008 20:29:20 -0800 (PST)
From:	Davide Libenzi <davidel@...ilserver.org>
To:	Jeff Roberson <jroberson@...sapeake.net>
cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	riel@...hat.com, Zach Brown <zach.brown@...cle.com>
Subject: Re: [PATCH] eventfd signal race in aio_complete()

[cc-ing zab]


On Fri, 7 Mar 2008, Jeff Roberson wrote:

> Hello,

Hi!


> I have an application that makes use of eventfd to merge socket and aio
> blocking with epoll in one thread.  Under heavy loads the application
> sometimes hangs when we receive notification from epoll that the eventfd has
> an event ready but reading the aio completions produces no results. Further
> investigation revealed that the aiocb was later ready with no new event and
> completing it based on a timer resolved the application hang.
>
> This pointed to the eventfd being signaled prematurely and I verified that
> this was indeed the problem.  aio_complete() calls eventfd_signal() before the
> event is actually placed on the completion ring.  On a multi-processor system
> it is possible to read the event from epoll and return to userspace before
> aio_complete() finishes.
> 
> The enclosed patch simply moves the signaling to the bottom of the function.
> I'm not 100% familiar with this code and it looks like it may be possible to
> have spurious wakeups now but there will be no missed wakeups.  An application
> may also race the other way now and receive aio completion before the signal,
> thus still leaving it with a signal with no completion.  signaling while the
> kioctx is locked would resolve this but I was hesitant to introduce further
> nesting of spinlocks that might have another order elsewhere.

Your patch access the iocb after the __aio_put_req() call, that can make 
the iocb (and the reference to the ki_eventfd) to become invalid. It also 
has the spurious wakeup issue (not a biggie, but if it can be avoided).
There're two solutions AFAICS. The first solution/patch get a reference to 
the file*, and signal (if the event has really been dropped inside the 
ring) and release.
The second solution/patch simply moves the eventfd_signal() call before 
the __aio_put_req() call, but after the event has beed "ringed".
We should be clear to go with the shorter/nicer second solution. Those 
patches builds, but I'm not even signing them off till I tested them.




- Davide



---
 fs/aio.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

Index: linux-2.6.mod/fs/aio.c
===================================================================
--- linux-2.6.mod.orig/fs/aio.c	2008-03-07 19:33:44.000000000 -0800
+++ linux-2.6.mod/fs/aio.c	2008-03-07 19:45:50.000000000 -0800
@@ -916,7 +916,8 @@
 	struct kioctx	*ctx = iocb->ki_ctx;
 	struct aio_ring_info	*info;
 	struct aio_ring	*ring;
-	struct io_event	*event;
+	struct io_event	*event = NULL;
+	struct file *file = NULL;
 	unsigned long	flags;
 	unsigned long	tail;
 	int		ret;
@@ -937,12 +938,15 @@
 	}
 
 	/*
-	 * Check if the user asked us to deliver the result through an
-	 * eventfd. The eventfd_signal() function is safe to be called
-	 * from IRQ context.
+	 * Get a reference now, but do not deliver the event until
+	 * we're sure we actually dropped it inside the ring. We
+	 * need to get a reference before calling __aio_put_req(),
+	 * since the ->ki_eventfd may become invalid after such call.
 	 */
-	if (!IS_ERR(iocb->ki_eventfd))
-		eventfd_signal(iocb->ki_eventfd, 1);
+	if (!IS_ERR(iocb->ki_eventfd)) {
+		file = iocb->ki_eventfd;
+		get_file(file);
+	}
 
 	info = &ctx->ring_info;
 
@@ -1000,6 +1004,19 @@
 		wake_up(&ctx->wait);
 
 	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+
+	/*
+	 * If the user requested us to deliver a completion event to an
+	 * eventfd file descriptor *and* we actually delivered the event,
+	 * signal it with eventfd_signal(). The eventfd_signal() function
+	 * is safe to be called from IRQ context.
+	 */
+	if (file) {
+		if (event)
+			eventfd_signal(file, 1);
+		fput(file);
+	}
+
 	return ret;
 }
 





---
 fs/aio.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Index: linux-2.6.mod/fs/aio.c
===================================================================
--- linux-2.6.mod.orig/fs/aio.c	2008-03-07 20:14:55.000000000 -0800
+++ linux-2.6.mod/fs/aio.c	2008-03-07 20:15:24.000000000 -0800
@@ -936,14 +936,6 @@
 		return 1;
 	}
 
-	/*
-	 * Check if the user asked us to deliver the result through an
-	 * eventfd. The eventfd_signal() function is safe to be called
-	 * from IRQ context.
-	 */
-	if (!IS_ERR(iocb->ki_eventfd))
-		eventfd_signal(iocb->ki_eventfd, 1);
-
 	info = &ctx->ring_info;
 
 	/* add a completion event to the ring buffer.
@@ -992,6 +984,15 @@
 	kunmap_atomic(ring, KM_IRQ1);
 
 	pr_debug("added to ring %p at [%lu]\n", iocb, tail);
+
+	/*
+	 * Check if the user asked us to deliver the result through an
+	 * eventfd. The eventfd_signal() function is safe to be called
+	 * from IRQ context.
+	 */
+	if (!IS_ERR(iocb->ki_eventfd))
+		eventfd_signal(iocb->ki_eventfd, 1);
+
 put_rq:
 	/* everything turned out well, dispose of the aiocb. */
 	ret = __aio_put_req(ctx, iocb);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/