linux-kernel - Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.1.10.0906191421581.14884@makko.or.mcafeemobile.com>
Date:	Fri, 19 Jun 2009 14:26:47 -0700 (PDT)
From:	Davide Libenzi <davidel@...ilserver.org>
To:	Gregory Haskins <ghaskins@...ell.com>
cc:	mst@...hat.com, kvm@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	avi@...hat.com, paulmck@...ux.vnet.ibm.com,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH 3/3] eventfd: add internal reference counting to fix
 notifier race conditions

On Fri, 19 Jun 2009, Gregory Haskins wrote:

> Davide Libenzi wrote:
> > On Fri, 19 Jun 2009, Gregory Haskins wrote:
> >
> >   
> >> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
> >> notifier->release() callback.  This lets notification clients know if
> >> the eventfd is about to go away and is very useful particularly for
> >> in-kernel clients.  However, as it stands today it is not possible to
> >> use the notification API in a race-free way.  This patch adds some
> >> additional logic to the notification subsystem to rectify this problem.
> >>
> >> Background:
> >> -----------------------
> >> Eventfd currently only has one reference count mechanism: fget/fput.  This
> >> in of itself is normally fine.  However, if a client expects to be
> >> notified if the eventfd is closed, it cannot hold a fget() reference
> >> itself or the underlying f_ops->release() callback will never be invoked
> >> by VFS.  Therefore we have this somewhat unusual situation where we may
> >> hold a pointer to an eventfd object (by virtue of having a waiter registered
> >> in its wait-queue), but no reference.  This makes it nearly impossible to
> >> design a mutual decoupling algorithm: you cannot unhook one side from the
> >> other (or vice versa) without racing.
> >>     
> >
> > And why is that?
> >
> > struct xxx {
> > 	struct mutex mtx;
> > 	struct file *file;
> > 	...
> > };
> >
> > struct file *xxx_get_file(struct xxx *x) {
> > 	struct file *file;
> >
> > 	mutex_lock(&x->mtx);
> > 	file = x->file;
> > 	if (!file)
> > 		mutex_unlock(&x->mtx);
> > 	return file;
> > }
> >
> > void xxx_release_file(struct xxx *x) {
> > 	mutex_unlock(&x->mtx);
> > }
> >
> > void handle_POLLHUP(struct xxx *x) {
> > 	struct file *file;
> >
> > 	file = xxx_get_file(x);
> > 	if (file) {
> > 		unhook_waitqueue(file, ...);
> > 		x->file = NULL;
> > 		xxx_release_file(x);
> > 	}
> > }
> >
> >
> > Every time you need to "use" file, you call xxx_get_file(), and if you get 
> > NULL, it means it's gone and you handle it accordigly to your IRQ fd 
> > policies. As soon as you done with the file, you call xxx_release_file().
> > Replace "mtx" with the lock that fits your needs.
> >   
> 
> Consider what would happen if the f_ops->release() was preempted inside
> the wake_up_locked_polled() after it dereferenced the xxx from the list,
> but before it calls the callback(POLLHUP).  The xxx object, and/or the
> .text for the xxx object may be long gone by the time it comes back
> around.  Afaict, there is no way to guard against that scenario unless
> you do something like 2/3+3/3.  Or am I missing something?

Right. Don't you see an easier answer to that, instead of adding 300 lines 
of code to eventfd?
For example, turning wake_up_locked() into a nornal wake_up().



- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/