linux-kernel - Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 07 May 2009 10:01:41 -0400
From:	Gregory Haskins <ghaskins@...ell.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
CC:	Avi Kivity <avi@...hat.com>,
	Davide Libenzi <davidel@...ilserver.org>,
	Gregory Haskins <ghaskins@...ell.com>, viro@...IV.linux.org.uk,
	kvm@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification
 interface

Marcelo Tosatti wrote:
> On Thu, May 07, 2009 at 12:48:21PM +0300, Avi Kivity wrote:
>   
>> Davide Libenzi wrote:
>>     
>>> On Wed, 6 May 2009, Gregory Haskins wrote:
>>>
>>>   
>>>       
>>>> I think we are ok in this regard (at least in v5) without the 
>>>> callback. kvm holds irqfd, which holds eventfd.  In a normal 
>>>> situation, we will
>>>> have eventfd with 2 references.  If userspace closes the eventfd, it
>>>> will drop 1 of the 2 eventfd file references, but the object should
>>>> remain intact as long as kvm still holds it as well.  When the kvm-fd is
>>>> released, we will then decouple from the eventfd->wqh and drop the last
>>>> fput(), officially freeing it.
>>>>
>>>> Likewise, if kvm is closed before the eventfd, we will simply decouple
>>>> from the wqh and fput(eventfd), leaving the last reference held by
>>>> userspace until it closes as well.
>>>>
>>>> Let me know if you see any holes in that.
>>>>     
>>>>         
>>> Looks OK, modulo my knowledge of KVM internals.
>>>   
>>>       
>> What's your take on adding irq context safe callbacks to irqfd?
>>
>> To give some background here, we would like to use eventfd as a generic  
>> connector between components, so the components do not know about each  
>> other.  So far eventfd successfully abstracts among components in the  
>> same process, in different processes, and in the kernel.
>>
>> eventfd_signal() can be safely called from irq context, and will wake up  
>> a waiting task.  But in some cases, if the consumer is in the kernel, it  
>> may be able to consume the event from irq context, saving a context 
>> switch.
>>
>> So, will you consider patches adding this capability to eventfd?
>>     
>
> (pasting from a separate thread)
>
>   
>> That's my thinking.  PCI interrupts don't work because we need to do  
>> some hacky stuff in there, but MSI should.  Oh, and we could improve
>> UIO  
>> support for interrupts when using MSI, since there's no need to  
>> acknowledge the interrupt.
>>     
>
> Ok, so for INTx assigned devices all you need to do on the ACK handler
> is to re-enable the host interrupt (and set the guest interrupt line to
> low).
>
> Right now the ack comes through a kvm internal irq ack callback.
>
> AFAICS there is no mechanism in irqfd for ACK notification, and
> interrupt injection is edge triggered.
>
> So for PCI INTx assigned devices (or any INTx level), you'd want to keep
> the guest interrupt high, with some way to notify the ACK.
>
> Avi mentioned a separate irqfd to notify the ACK. For assigned devices,
> you could register a fd wakeup function in that fd, which replaces the
> current irq ACK callback?
>   

One thing I was thinking here was that I could create a flag for the
kvm_irqfd() function for something like "KVM_IRQFD_MODE_CLEAR".  This
flag when specified at creation time will cause the event to execute a
clear operation instead of a set when triggered.  That way, the default
mode is an edge-triggered set.  The non-default mode is to trigger a
clear.  Level-triggered ints could therefore create two irqfds, one for
raising, the other for clearing.

An alternative is to abandon the use of eventfd, and allow the irqfd to
be a first-class anon-fd.  The parameters passed to the write/signal()
function could then indicate the desired level.  The disadvantage would
be that it would not be compatible with eventfd, so we would need to
decide if the tradeoff is worth it.

OTOH, I suspect level triggered interrupts will be primarily in the
legacy domain, so perhaps we do not need to worry about it too much. 
Therefore, another option is that we *could* simply set the stake in the
ground that legacy/level cannot use irqfd.

Thoughts?

-Greg



Download attachment "signature.asc" of type "application/pgp-signature" (267 bytes)