linux-kernel - [PATCH 1/6] [-mm] epoll: fix for epoll_wait sometimes returning events on closed fds

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <49A42E26.1040509@cybernetics.com>
Date:	Tue, 24 Feb 2009 12:28:06 -0500
From:	Tony Battersby <tonyb@...ernetics.com>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	Davide Libenzi <davidel@...ilserver.org>
Cc:	Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org
Subject: [PATCH 1/6] [-mm] epoll: fix for epoll_wait sometimes returning events
 on closed fds

>From the epoll manpage:
Q: Will closing a file descriptor cause it to be removed from all
   epoll sets automatically?
A: Yes

sys_close() calls filp_close(), which calls fput().  If no one else
holds a reference to the file, then fput() calls __fput(), and __fput()
calls eventpoll_release(), which prevents epoll_wait() from returning
events on the fd to userspace.  In the rare case that sys_close()
doesn't call __fput() because someone else has a reference to the file,
a subsequent epoll_wait() may still unexpectedly return events on the
fd after it has been closed.  This can end up confusing or crashing
a userspace program that doesn't do epoll_ctl(EPOLL_CTL_DEL) before
closing the fd.  I have reports of this actually happening under
heavy load with a program using epoll with network sockets on 2.6.27.

This patch fixes the problem by calling eventpoll_release_file()
from filp_close() instead of from __fput().

The locking in eventpoll_release() and eventpoll_release_file() needs
to be changed because previously it relied on the fact that no one
else could have a reference to the file when called from __fput(),
and this is no longer true.  The new locking is admittedly ugly,
but I believe it works.

ep_insert() now also needs to check if the file has been closed
to avoid races in multi-threaded programs where one thread is doing
epoll_ctl(EPOLL_CTL_ADD) and another thread is closing the fd.  This is
done by checking that fget_light still returns the same struct file *
as before.

Note that the list_del_init(&epi->fllink) previously done in
eventpoll_release_file() was unnecessary because it is also done
by ep_remove().

Userspace programs that might run on kernels with this bug can work
around the problem by doing epoll_ctl(EPOLL_CTL_DEL) before close().

Signed-off-by: Tony Battersby <tonyb@...ernetics.com>
---

This patch is against the current -mm tree.  However, a different
version of this patch may go into 2.6.29, which will create conflicts
with this version of the patch and the other patches currently
in -mm.  You may refer to this version of the patch to help resolve
the conflicts (e.g. file->f_ep_lock changing to file->f_lock).

diff -urpN a/fs/eventpoll.c b/fs/eventpoll.c
--- a/fs/eventpoll.c	2009-02-24 09:38:32.000000000 -0500
+++ b/fs/eventpoll.c	2009-02-24 09:40:26.000000000 -0500
@@ -697,29 +697,40 @@ void eventpoll_release_file(struct file 
 	struct epitem *epi;
 
 	/*
-	 * We don't want to get "file->f_lock" because it is not
-	 * necessary. It is not necessary because we're in the "struct file"
-	 * cleanup path, and this means that noone is using this file anymore.
-	 * So, for example, epoll_ctl() cannot hit here since if we reach this
-	 * point, the file counter already went to zero and fget() would fail.
-	 * The only hit might come from ep_free() but by holding the mutex
-	 * will correctly serialize the operation. We do need to acquire
-	 * "ep->mtx" after "epmutex" because ep_remove() requires it when called
-	 * from anywhere but ep_free().
-	 *
-	 * Besides, ep_remove() acquires the lock, so we can't hold it here.
+	 * epmutex makes it safe to use *ep by preventing ep_free() from
+	 * running.
 	 */
 	mutex_lock(&epmutex);
 
+	spin_lock(&file->f_lock);
 	while (!list_empty(lsthead)) {
-		epi = list_first_entry(lsthead, struct epitem, fllink);
+		ep = list_first_entry(lsthead, struct epitem, fllink)->ep;
+		spin_unlock(&file->f_lock);
 
-		ep = epi->ep;
-		list_del_init(&epi->fllink);
+		/*
+		 * ep->mtx protects against epoll_ctl() and is required for
+		 * calling ep_remove().
+		 */
 		mutex_lock(&ep->mtx);
-		ep_remove(ep, epi);
+
+		spin_lock(&file->f_lock);
+		epi = list_first_entry(lsthead, struct epitem, fllink);
+		spin_unlock(&file->f_lock);
+
+		/*
+		 * epoll_ctl(EPOLL_CTL_DEL) may have modified the list before
+		 * we locked ep->mtx, so it is necessary to check that the
+		 * first entry still exists and belongs to the ep whose mtx we
+		 * are holding.
+		 */
+		if (epi && epi->ep == ep)
+			ep_remove(ep, epi);
+
 		mutex_unlock(&ep->mtx);
+
+		spin_lock(&file->f_lock);
 	}
+	spin_unlock(&file->f_lock);
 
 	mutex_unlock(&epmutex);
 }
@@ -895,6 +906,22 @@ static void ep_rbtree_insert(struct even
 }
 
 /*
+ * Returns true if the fd has been closed or false if not.
+ */
+static bool ep_is_file_closed(struct file *file, int fd)
+{
+	struct file *file2;
+	int needs_fput;
+	bool closed;
+
+	file2 = fget_light(fd, &needs_fput);
+	closed = (file2 != file);
+	if (file2)
+		fput_light(file2, needs_fput);
+	return closed;
+}
+
+/*
  * Must be called with "mtx" held.
  */
 static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
@@ -945,6 +972,16 @@ static int ep_insert(struct eventpoll *e
 
 	/* Add the current item to the list of active epoll hook for this file */
 	spin_lock(&tfile->f_lock);
+	/*
+	 * Don't add the fd to the epoll set if it was closed in between the
+	 * initial fget() and the time we get here.  This check must be done
+	 * while holding f_lock to prevent races with eventpoll_release_file().
+	 */
+	if (ep_is_file_closed(tfile, fd)) {
+		spin_unlock(&tfile->f_lock);
+		error = -EBADF;
+		goto error_unregister;
+	}
 	list_add_tail(&epi->fllink, &tfile->f_ep_links);
 	spin_unlock(&tfile->f_lock);
 
diff -urpN a/fs/file_table.c b/fs/file_table.c
--- a/fs/file_table.c	2009-02-24 09:38:23.000000000 -0500
+++ b/fs/file_table.c	2009-02-24 09:39:45.000000000 -0500
@@ -267,11 +267,6 @@ void __fput(struct file *file)
 	might_sleep();
 
 	fsnotify_close(file);
-	/*
-	 * The function eventpoll_release() should be the first called
-	 * in the file cleanup chain.
-	 */
-	eventpoll_release(file);
 	locks_remove_flock(file);
 
 	if (unlikely(file->f_flags & FASYNC)) {
diff -urpN a/fs/open.c b/fs/open.c
--- a/fs/open.c	2009-02-24 09:38:24.000000000 -0500
+++ b/fs/open.c	2009-02-24 09:39:45.000000000 -0500
@@ -29,6 +29,7 @@
 #include <linux/rcupdate.h>
 #include <linux/audit.h>
 #include <linux/falloc.h>
+#include <linux/eventpoll.h>
 
 int vfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
@@ -1104,6 +1105,7 @@ int filp_close(struct file *filp, fl_own
 
 	dnotify_flush(filp, id);
 	locks_remove_posix(filp, id);
+	eventpoll_release(filp);
 	fput(filp);
 	return retval;
 }
diff -urpN a/include/linux/eventpoll.h b/include/linux/eventpoll.h
--- a/include/linux/eventpoll.h	2009-02-24 09:38:24.000000000 -0500
+++ b/include/linux/eventpoll.h	2009-02-24 09:39:45.000000000 -0500
@@ -68,23 +68,25 @@ static inline void eventpoll_init_file(s
 void eventpoll_release_file(struct file *file);
 
 /*
- * This is called from inside fs/file_table.c:__fput() to unlink files
+ * This is called from inside fs/open.c:filp_close() to unlink files
  * from the eventpoll interface. We need to have this facility to cleanup
  * correctly files that are closed without being removed from the eventpoll
  * interface.
  */
 static inline void eventpoll_release(struct file *file)
 {
+	bool empty;
 
 	/*
-	 * Fast check to avoid the get/release of the semaphore. Since
-	 * we're doing this outside the semaphore lock, it might return
+	 * Fast check to avoid the get/release of the mutex. Since
+	 * we're doing this outside the mutex lock, it might return
 	 * false negatives, but we don't care. It'll help in 99.99% of cases
-	 * to avoid the semaphore lock. False positives simply cannot happen
-	 * because the file in on the way to be removed and nobody ( but
-	 * eventpoll ) has still a reference to this file.
+	 * to avoid the mutex lock.
 	 */
-	if (likely(list_empty(&file->f_ep_links)))
+	spin_lock(&file->f_lock);
+	empty = list_empty(&file->f_ep_links);
+	spin_unlock(&file->f_lock);
+	if (likely(empty))
 		return;
 
 	/*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/