linux-kernel - Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 19 Jun 2009 17:16:59 -0400
From:	Gregory Haskins <ghaskins@...ell.com>
To:	Davide Libenzi <davidel@...ilserver.org>
CC:	mst@...hat.com, kvm@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	avi@...hat.com, paulmck@...ux.vnet.ibm.com,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier
 race conditions

Davide Libenzi wrote:
> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>
>   
>> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
>> notifier->release() callback.  This lets notification clients know if
>> the eventfd is about to go away and is very useful particularly for
>> in-kernel clients.  However, as it stands today it is not possible to
>> use the notification API in a race-free way.  This patch adds some
>> additional logic to the notification subsystem to rectify this problem.
>>
>> Background:
>> -----------------------
>> Eventfd currently only has one reference count mechanism: fget/fput.  This
>> in of itself is normally fine.  However, if a client expects to be
>> notified if the eventfd is closed, it cannot hold a fget() reference
>> itself or the underlying f_ops->release() callback will never be invoked
>> by VFS.  Therefore we have this somewhat unusual situation where we may
>> hold a pointer to an eventfd object (by virtue of having a waiter registered
>> in its wait-queue), but no reference.  This makes it nearly impossible to
>> design a mutual decoupling algorithm: you cannot unhook one side from the
>> other (or vice versa) without racing.
>>     
>
> And why is that?
>
> struct xxx {
> 	struct mutex mtx;
> 	struct file *file;
> 	...
> };
>
> struct file *xxx_get_file(struct xxx *x) {
> 	struct file *file;
>
> 	mutex_lock(&x->mtx);
> 	file = x->file;
> 	if (!file)
> 		mutex_unlock(&x->mtx);
> 	return file;
> }
>
> void xxx_release_file(struct xxx *x) {
> 	mutex_unlock(&x->mtx);
> }
>
> void handle_POLLHUP(struct xxx *x) {
> 	struct file *file;
>
> 	file = xxx_get_file(x);
> 	if (file) {
> 		unhook_waitqueue(file, ...);
> 		x->file = NULL;
> 		xxx_release_file(x);
> 	}
> }
>
>
> Every time you need to "use" file, you call xxx_get_file(), and if you get 
> NULL, it means it's gone and you handle it accordigly to your IRQ fd 
> policies. As soon as you done with the file, you call xxx_release_file().
> Replace "mtx" with the lock that fits your needs.
>   

Consider what would happen if the f_ops->release() was preempted inside
the wake_up_locked_polled() after it dereferenced the xxx from the list,
but before it calls the callback(POLLHUP).  The xxx object, and/or the
.text for the xxx object may be long gone by the time it comes back
around.  Afaict, there is no way to guard against that scenario unless
you do something like 2/3+3/3.  Or am I missing something?

-Greg



Download attachment "signature.asc" of type "application/pgp-signature" (267 bytes)