linux-kernel - Re: [PATCH 0/7] KVM: Kernel-based Virtual Machine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <453B99D7.1050004@qumranet.com>
Date:	Sun, 22 Oct 2006 18:18:31 +0200
From:	Avi Kivity <avi@...ranet.com>
To:	Arnd Bergmann <arnd@...db.de>
CC:	Muli Ben-Yehuda <muli@...ibm.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Anthony Liguori <aliguori@...ibm.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: [PATCH 0/7] KVM: Kernel-based Virtual Machine

Arnd Bergmann wrote:
> On Sunday 22 October 2006 10:37, Avi Kivity wrote:
>   
>> I like this.  Since we plan to support multiple vcpus per vm, the fs
>> structure might look like:
>>
>> /kvm/my_vm
>>     |
>>     +----memory          # mkdir to create memory slot.
>>     
>
> Note that the way spufs does it, every directory is a reference-counted
> object. Currently that includes single contexts and groups of
> contexts that are supposed to be scheduled simultaneously.
>
> The trick is that we use the special 'spu_create' syscall to
> add a new object, while naming it, and return an open file
> descriptor to it. When that file descriptor gets closed, the
> object gets garbage-collected automatically.
>   

Yes.  Well, a single fd and ioctl()s do that as well.

>
> We ended up adding a lot more file than we initially planned,
> but the interface is really handy, especially if you want to
> create some procps-like tools for it.
>
>   

I don't really see the need.  The cell dsps are a shared resource, while 
virtual machines are just another execution mode of an existing resource 
- the main cpu, which has a sharing mechanism (the scheduler and 
priorities).


>>     |     |              #    how to set size and offset?
>>     |     |
>>     |     +---0          # guest physical memory slot
>>     |         |
>>     |         +-- dirty_bitmap  # read to get and atomically reset
>>     |                           # the changed pages log
>>     
>
> Have you thought about simply defining your guest to be a section
> of the processes virtual address space? That way you could use
> an anonymous mapping in the host as your guest address space, or
> even use a file backed mapping in order to make the state persistant
> over multiple runs. Or you could map the guest kernel into the
> guest real address space with a private mapping and share the
> text segment over multiple guests to save L2 and RAM.
>   

I've thought of it but it can't work on i386 because guest physical 
address space is larger than virtual address space on i386.  So we 
mmap("/dev/kvm") with file offsets corresponding to guest physical 
addresses.

I still like that idea, since it allows using hugetlbfs and allowing 
swapping.  Perhaps we'll just accept the limitation that guests on i386 
are limited.

>   
>>     |
>>     |
>>     +----cpu             # mkdir/rmdir to create/remove vcpu
>>           |
>>     
>
> I'd recommend not allowing mkdir or similar operations, although
> it's not that far off. One option would be to let the user specify
> the number of CPUs at kvm_create() time, another option might
> be to allow kvm_create with a special flag or yet another syscall
> to create the vcpu objects.
>   

Okay.

>   
>>           +----0
>>           |     |
>>           |     +--- irq     # write to inject an irq
>>           |     |
>>           |     +--- regs    # read/write to get/set registers
>>           |     |
>>           |     +--- debugger   # write to set breakpoints/singlestep mode
>>           |
>>           +----1
>>                 [...]
>>
>> It's certainly a lot more code though, and requires new syscalls.  Since
>> this is a little esoteric does it warrant new syscalls?
>>     
>
> We've gone through a number of iterations on the spufs design regarding this,
> and in the end decided that the garbage-collecting property of spu_create
> was superior to any other option, and adding the spu_run syscall was then
> the logical step. BTW, one inspiration for spu_run came from sys_vm86, which
> as you are probably aware of is already doing a lot of what you do, just
> not for protected mode guests.
>   

Yes, we're doing a sort of vmx86_64().

Thanks for the ideas, I'm certainly leaning towards a filesystem based 
approach and I'll also reconsider the mapping (mmap() vi virtual address 
space subsection).

-- 
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/