linux-kernel - Re: [kvm-devel] [PATCH 3/3] virtio PCI device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4747051C.3090903@us.ibm.com>
Date:	Fri, 23 Nov 2007 10:51:40 -0600
From:	Anthony Liguori <aliguori@...ibm.com>
To:	Avi Kivity <avi@...ranet.com>
CC:	kvm-devel@...ts.sourceforge.net, virtualization@...ts.osdl.org,
	linux-kernel@...r.kernel.org,
	Eric Van Hensbergen <ericvanhensbergen@...ibm.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	lguest <lguest@...abs.org>
Subject: Re: [kvm-devel] [PATCH 3/3] virtio PCI device

Avi Kivity wrote:
> Anthony Liguori wrote:
>> Well please propose the virtio API first and then I'll adjust the PCI 
>> ABI.  I don't want to build things into the ABI that we never 
>> actually end up using in virtio :-)
>>
>>   
>
> Move ->kick() to virtio_driver.

Then on each kick, all queues have to be checked for processing?  What 
devices do you expect this would help?

> I believe Xen networking uses the same event channel for both rx and 
> tx, so in effect they're using this model.  Long time since I looked 
> though,

I would have to look, but since rx/tx are rather independent actions, 
I'm not sure that you would really save that much.  You still end up 
doing the same number of kicks unless I'm missing something.

>> I was thinking more along the lines that a hypercall-based device 
>> would certainly be implemented in-kernel whereas the current device 
>> is naturally implemented in userspace.  We can simply use a different 
>> device for in-kernel drivers than for userspace drivers.  
>
> Where the device is implemented is an implementation detail that 
> should be hidden from the guest, isn't that one of the strengths of 
> virtualization?  Two examples: a file-based block device implemented 
> in qemu gives you fancy file formats with encryption and compression, 
> while the same device implemented in the kernel gives you a 
> low-overhead path directly to a zillion-disk SAN volume.  Or a 
> user-level network device capable of running with the slirp stack and 
> no permissions vs. the kernel device running copyless most of the time 
> and using a dma engine for the rest but requiring you to be good 
> friends with the admin.
>
> The user should expect zero reconfigurations moving a VM from one 
> model to the other.

I'm wary of introducing the notion of hypercalls to this device because 
it makes the device VMM specific.  Maybe we could have the device 
provide an option ROM that was treated as the device "BIOS" that we 
could use for kicking and interrupt acking?  Any idea of how that would 
map to Windows?  Are there real PCI devices that use the option ROM 
space to provide what's essentially firmware?  Unfortunately, I don't 
think an option ROM BIOS would map well to other architectures.

>> None of the PCI devices currently work like that in QEMU.  It would 
>> be very hard to make a device that worked this way because since the 
>> order in which values are written matter a whole lot.  For instance, 
>> if you wrote the status register before the queue information, the 
>> driver could get into a funky state.
>>   
>
> I assume you're talking about restore?  Isn't that atomic?

If you're doing restore by passing the PCI config blob to a registered 
routine, then sure, but that doesn't seem much better to me than just 
having the device generate that blob in the first place (which is what 
we have today).  I was assuming that you would want to use the existing 
PIO/MMIO handlers to do restore by rewriting the config as if the guest was.

>>> Not much of an argument, I know.
>>>
>>>
>>> wrt. number of queues, 8 queues will consume 32 bytes of pci space 
>>> if all you store is the ring pfn.
>>>     
>>
>> You also at least need a num argument which takes you to 48 or 64 
>> depending on whether you care about strange formatting.  8 queues may 
>> not be enough either.  Eric and I have discussed whether the 9p 
>> virtio device should support multiple mounts per-virtio device and if 
>> so, whether each one should have it's own queue.  Any devices that 
>> supports this sort of multiplexing will very quickly start using a 
>> lot of queues.
>>   
>
> Make it appear as a pci function?  (though my feeling is that multiple 
> mounts should be different devices; we can then hotplug mountpoints).

We may run out of PCI slots though :-/

>> I think most types of hardware have some notion of a selector or 
>> mode.  Take a look at the LSI adapter or even VGA.
>>
>>   
>
> True.  They aren't fun to use, though.

I don't think they're really any worse :-)

Regards,

Anthony Liguori

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/