lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0906201202170.25534@makko.or.mcafeemobile.com>
Date:	Sat, 20 Jun 2009 14:17:44 -0700 (PDT)
From:	Davide Libenzi <davidel@...ilserver.org>
To:	Gregory Haskins <ghaskins@...ell.com>
cc:	mst@...hat.com, kvm@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	avi@...hat.com, paulmck@...ux.vnet.ibm.com,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH 3/3] eventfd: add internal reference counting to fix
 notifier race conditions

On Fri, 19 Jun 2009, Gregory Haskins wrote:

> > In the POLLIN event, you schedule_work(&irqfd->inject) and there are no 
> > races there AFAICS (you basically do not care of anything eventfd memory 
> > related at all).
> > For POLLHUP, you do:
> >
> > 	spin_lock(irqfd->slock);
> > 	if (irqfd->wqh)
> > 		schedule_work(&irqfd->inject);
> > 	irqfd->wqh = NULL;
> > 	spin_unlock(irqfd->slock);
> >
> > In your work function you notice the POLLHUP condition and take proper 
> > action (dunno what it is in your case).
> > In your kvm_irqfd_release() function:
> >
> > 	spin_lock(irqfd->slock);
> > 	if (irqfd->wqh)
> > 		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> > 	irqfd->wqh = NULL;
> > 	spin_unlock(irqfd->slock);
> >
> > Any races in there?
> >   
> 
> Yes, for one you have an ABBA deadlock on the irqfd->slock and wqh->lock
> (recall wqh has to be locked to fix that other race I mentioned).

Yep, true. How about we go in little steps?
How about the one below?
When you register your poll callback, you get a reference with 
eventfd_refget().
At that point we have no more worries about the eventfd memory (namely 
"wqh") going away.
Symmetrically, you use eventfd_refput() once you done with it.
So we basically de-couple the internal VFS refcount, from the eventfd 
context memory refcount.
Where are the non-solveable races on the IRQfd side, of an approach like 
this one?



> Yes, understood.  What I was trying to gently say is that the one-liner
> proposal alone is still broken afaict.  However, if there is another
> solution that works that you like better than 133-liner I posted, I am
> more than willing to help analyze it.  In the end, I only care that this
> is fixed.

Is it so?
What I noticed here, is that you went from "Detailed Mode" (in the emails 
where you were pushing the new bits), to "Hint Mode" (as below) when it 
was time to see if there were simpler solutions to the problem.

> (As a hint, I think I fixed 4-5 races with these patches, so there are a
> few others still lurking as well)

Not knowing the details of IRQfd, hinting only is not going to help 
anything in this case.



- Davide



---
 fs/eventfd.c            |   37 ++++++++++++++++++++++++++++++++++++-
 include/linux/eventfd.h |    6 ++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

Index: linux-2.6.mod/fs/eventfd.c
===================================================================
--- linux-2.6.mod.orig/fs/eventfd.c	2009-06-20 13:08:22.000000000 -0700
+++ linux-2.6.mod/fs/eventfd.c	2009-06-20 14:00:23.000000000 -0700
@@ -17,8 +17,10 @@
 #include <linux/eventfd.h>
 #include <linux/syscalls.h>
 #include <linux/module.h>
+#include <linux/kref.h>
 
 struct eventfd_ctx {
+	struct kref kref;
 	wait_queue_head_t wqh;
 	/*
 	 * Every time that a write(2) is performed on an eventfd, the
@@ -59,9 +61,19 @@ int eventfd_signal(struct file *file, in
 }
 EXPORT_SYMBOL_GPL(eventfd_signal);
 
+static void eventfd_free(struct kref *kref)
+{
+	struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
+
+	kfree(ctx);
+}
+
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-	kfree(file->private_data);
+	struct eventfd_ctx *ctx = file->private_data;
+
+	wake_up_poll(&ctx->wqh, POLLHUP);
+	kref_put(&ctx->kref, eventfd_free);
 	return 0;
 }
 
@@ -201,6 +213,28 @@ struct file *eventfd_fget(int fd)
 }
 EXPORT_SYMBOL_GPL(eventfd_fget);
 
+int eventfd_refget(struct file *file)
+{
+	struct eventfd_ctx *ctx = file->private_data;
+
+	if (file->f_op != &eventfd_fops)
+		return -EINVAL;
+	kref_get(&ctx->kref);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(eventfd_refget);
+
+int eventfd_refput(struct file *file)
+{
+	struct eventfd_ctx *ctx = file->private_data;
+
+	if (file->f_op != &eventfd_fops)
+		return -EINVAL;
+	kref_put(&ctx->kref, eventfd_free);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(eventfd_refput);
+
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 {
 	int fd;
@@ -217,6 +251,7 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, 
 	if (!ctx)
 		return -ENOMEM;
 
+	kref_init(&ctx->kref);
 	init_waitqueue_head(&ctx->wqh);
 	ctx->count = count;
 	ctx->flags = flags;
Index: linux-2.6.mod/include/linux/eventfd.h
===================================================================
--- linux-2.6.mod.orig/include/linux/eventfd.h	2009-06-20 13:08:22.000000000 -0700
+++ linux-2.6.mod/include/linux/eventfd.h	2009-06-20 13:59:09.000000000 -0700
@@ -29,12 +29,18 @@
 
 struct file *eventfd_fget(int fd);
 int eventfd_signal(struct file *file, int n);
+int eventfd_refget(struct file *file);
+int eventfd_refput(struct file *file);
 
 #else /* CONFIG_EVENTFD */
 
 #define eventfd_fget(fd) ERR_PTR(-ENOSYS)
 static inline int eventfd_signal(struct file *file, int n)
 { return 0; }
+static inline int eventfd_refget(struct file *file)
+{ return -ENOSYS; }
+static inline int eventfd_refput(struct file *file)
+{ return -ENOSYS; }
 
 #endif /* CONFIG_EVENTFD */
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ