lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0906221418510.10952@makko.or.mcafeemobile.com>
Date:	Mon, 22 Jun 2009 15:53:55 -0700 (PDT)
From:	Davide Libenzi <davidel@...ilserver.org>
To:	Gregory Haskins <gregory.haskins@...il.com>
cc:	Gregory Haskins <ghaskins@...ell.com>,
	"Michael S. Tsirkin" <mst@...hat.com>, kvm@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	avi@...hat.com, paulmck@...ux.vnet.ibm.com,
	Ingo Molnar <mingo@...e.hu>,
	Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [PATCH 3/3] eventfd: add internal reference counting to fix
 notifier race conditions

On Mon, 22 Jun 2009, Gregory Haskins wrote:

> So up in userspace, the vbus pci-device would have an open reference to
> the kvm guest (derived from /dev/kvm) and an open reference to a vbus
> (derived from /dev/vbus).  Lets call these kvmfd, and vbusfd,
> respectively.  For something like an interrupt, we would hook the point
> where the PCI-MSI interrupt is assigned, and would do the following:
> 
> gsi = kvm_irq_route_gsi();
> fd = eventfd(0, 0);
> ioctl(kvmfd, KVM_IRQFD_ASSIGN, {fd, gsi});
> ioctl(vbusfd, VBUS_SHMSIGNAL_ASSIGN, {sigid, fd});
> 
> So userspace orchestrated the assignment of this one eventfd to a KVM
> consumer, and a VBUS producer.  The two subsystems do not care about the
> details of the other side of the link, per se.  VBUS just knows that it
> can eventfd_signal() its memory region to tell whomever is listening
> that it changed.  Likewise, KVM just knows to inject "gsi" when it gets
> signalled.  You could equally have given "fd" to a userspace thread for
> either producer or consumer roles, or any other combination.
> 
> If we were doing PCI-passthough, substitute the last SHMSIGNAL_ASSIGN
> ioctl call with some PCI_PASSTHROUGH_ASSIGN verb and you get the idea.
> 
> The important thing is that once this is established, userspace doesn't
> necessarily care about the fd anymore.  So now the question is: do we
> keep it around for other things?  Do we keep it around because we don't
> want KVM to see the POLLHUP, or do we address the "release" code so that
> it works even if userspace issued close(fd) at this point.  I am not
> sure what the answer is, but this is the scenario we are concerned with
> in this thread.  In the example above, vbus is free to produce events on
> its eventfd until it gets a SHMSIGNAL_DEASSIGN request.

I see.
The thing remains, that in order to reliably handle generic 
producer/consumer scenarios you'd need a reference counting similar to 
pipes, where the notion of producer and consumer is very well defined.
In your case, it'd be sufficent to expose an:

	int eventfd_wakeup(struct eventfd_ctx *ctx, void *key);

So that each end can signal, in an file* f_count independent way, when 
they're about to leave. Say with a new status bit.
The problem, that I felt coming since when I introduced the key-based 
wakeups, is that right now the "key" start to become a little restrictive 
in terms of amount of information carried by it.
If there are other key-aware waiters on "wqh", your new bit cannot 
conflict with existing ones, and you (and future users of the interface) 
will be polluting the global bit namespace.
Probably turning "key" into a pointer to a structure like:

	struct wakeup_key {
		unsigned long type;
		void *key;
	};

So the current:

	wake_up_poll(wq, POLLIN);

Would become:

	key.type = KT_POLLMASK;
	key.key = POLLIN;
	wake_up_key(wq, &key);

At that point the waiters (like poll/epoll) can simply discard the wakeup 
types they are not interested into (poll/epoll would care only about 
KT_POLLMASK), and something more than a simple bitmask can be carried by 
the wakups. Like:

	key.type = KT_SOMEDATA;
	key.key = &data;
	wake_up_key(wq, &key);

Of course, we could build another layer on top, to fit this model, but 
then we'd have the dual citizenship among normal wakeups and new-interface 
wakeups/signaling.
Another thing. Now that the interface exposes the eventfd_ctx context 
directly, the pollcb register/unregister does not have any reason to be 
inside eventfd (they're totally generic bits at that point), so my thought 
was about to drop them.



- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ