linux-kernel - Re: [kvm-devel] [PATCH 3/3] virtio PCI device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4747122F.1070905@qumranet.com>
Date:	Fri, 23 Nov 2007 19:47:27 +0200
From:	Avi Kivity <avi@...ranet.com>
To:	Anthony Liguori <aliguori@...ibm.com>
CC:	Eric Van Hensbergen <ericvanhensbergen@...ibm.com>,
	lguest <lguest@...abs.org>, kvm-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org, virtualization@...ts.osdl.org
Subject: Re: [kvm-devel] [PATCH 3/3] virtio PCI device

Anthony Liguori wrote:
> Avi Kivity wrote:
>   
>> Anthony Liguori wrote:
>>     
>>> Well please propose the virtio API first and then I'll adjust the PCI 
>>> ABI.  I don't want to build things into the ABI that we never 
>>> actually end up using in virtio :-)
>>>
>>>   
>>>       
>> Move ->kick() to virtio_driver.
>>     
>
> Then on each kick, all queues have to be checked for processing?  What 
> devices do you expect this would help?
>
>   

Networking.

>> I believe Xen networking uses the same event channel for both rx and 
>> tx, so in effect they're using this model.  Long time since I looked 
>> though,
>>     
>
> I would have to look, but since rx/tx are rather independent actions, 
> I'm not sure that you would really save that much.  You still end up 
> doing the same number of kicks unless I'm missing something.
>
>   

rx and tx are closely related. You rarely have one without the other.

In fact, a turned implementation should have zero kicks or interrupts 
for bulk transfers. The rx interrupt on the host will process new tx 
descriptors and fill the guest's rx queue; the guest's transmit function 
can also check the receive queue. I don't know if that's achievable for 
Linuz guests currently, but we should aim to make it possible.

Another point is that virtio still has a lot of leading zeros in its 
mileage counter. We need to keep things flexible and learn from others 
as much as possible, especially when talking about the ABI.

> I'm wary of introducing the notion of hypercalls to this device because 
> it makes the device VMM specific.  Maybe we could have the device 
> provide an option ROM that was treated as the device "BIOS" that we 
> could use for kicking and interrupt acking?  Any idea of how that would 
> map to Windows?  Are there real PCI devices that use the option ROM 
> space to provide what's essentially firmware?  Unfortunately, I don't 
> think an option ROM BIOS would map well to other architectures.
>
>   

The BIOS wouldn't work even on x86 because it isn't mapped to the guest 
address space (at least not consistently), and doesn't know the guest's 
programming model (16, 32, or 64-bits? segmented or flat?)

Xen uses a hypercall page to abstract these details out. However, I'm 
not proposing that. Simply indicate that we support hypercalls, and use 
some layer below to actually send them. It is the responsibility of this 
layer to detect if hypercalls are present and how to call them.

Hey, I think the best place for it is in paravirt_ops. We can even patch 
the hypercall instruction inline, and the driver doesn't need to know 
about it.

>>> None of the PCI devices currently work like that in QEMU.  It would 
>>> be very hard to make a device that worked this way because since the 
>>> order in which values are written matter a whole lot.  For instance, 
>>> if you wrote the status register before the queue information, the 
>>> driver could get into a funky state.
>>>   
>>>       
>> I assume you're talking about restore?  Isn't that atomic?
>>     
>
> If you're doing restore by passing the PCI config blob to a registered 
> routine, then sure, but that doesn't seem much better to me than just 
> having the device generate that blob in the first place (which is what 
> we have today).  I was assuming that you would want to use the existing 
> PIO/MMIO handlers to do restore by rewriting the config as if the guest was.
>
>   

Sure some complexity is unavoidable. But flat is simpler than indirect.

>>>> Not much of an argument, I know.
>>>>
>>>>
>>>> wrt. number of queues, 8 queues will consume 32 bytes of pci space 
>>>> if all you store is the ring pfn.
>>>>     
>>>>         
>>> You also at least need a num argument which takes you to 48 or 64 
>>> depending on whether you care about strange formatting.  8 queues may 
>>> not be enough either.  Eric and I have discussed whether the 9p 
>>> virtio device should support multiple mounts per-virtio device and if 
>>> so, whether each one should have it's own queue.  Any devices that 
>>> supports this sort of multiplexing will very quickly start using a 
>>> lot of queues.
>>>   
>>>       
>> Make it appear as a pci function?  (though my feeling is that multiple 
>> mounts should be different devices; we can then hotplug mountpoints).
>>     
>
> We may run out of PCI slots though :-/
>   

Then we can start selling virtio extension chassis.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/