linux-kernel - Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A1C45CE.3010807@redhat.com>
Date:	Tue, 26 May 2009 22:41:02 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Dan Magenheimer <dan.magenheimer@...cle.com>
CC:	George Dunlap <George.Dunlap@...citrix.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Xen-devel <xen-devel@...ts.xensource.com>,
	the arch/x86 maintainers <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Keir Fraser <keir.fraser@...citrix.com>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)

Dan Magenheimer wrote:
>> It will also be 
>> interesting to see how far Xen can get along without real memory 
>> management (overcommit).
>>     
>
> Several implementations of "classic" memory overcommit have been
> done for Xen, most recently the Difference Engine work at UCSD.
> It is true that none have been merged yet, in part because,
> in many real world environments, "generalized" overcommit
> often leads to hypervisor swapping, and performance becomes
> unacceptable.  (In other words, except in certain limited customer
> use models, memory overcommit is a "marketing feature".)
>   

Swapping indeed drags performance down horribly.  I regard it as a last 
resort solution used when everything else (page sharing, compression, 
ballooning, live migration) has failed.  By having that last resort you 
can actually use the other methods without fearing an out-of-memory 
condition eventually.

Note that with SSDs disks have started to narrow the gap between memory 
and secondary storage access times, so swapping will actually start 
improving rather than regressing as it has done in recent times.

> There's also a novel approach, Transcendent Memory (aka "tmem"
> see http://oss.oracle.com/projects/tmem).  Though tmem requires the
> guest to participate in memory management decisions (thus requiring
> a Linux patch), system-wide physical memory efficiency may
> improve vs memory deduplication, and hypervisor-based swapping
> is not necessary.
>   

Yes, I've seen that.  Another tool in the memory management arsenal.

>   
>> The Linux scheduler already supports multiple scheduling 
>> classes.  If we 
>> find that none of them will fit our needs, we'll propose a new one.  
>> When the need can be demonstrated to be real, and the 
>> implementation can 
>> be clean, Linux can usually be adapted.
>>     
>
> But that's exactly George and Jeremy's point.  KVM will
> eventually require changes that clutter Linux for purposes
> that are relevant only to a hypervisor.
>   

kvm has already made changes to Linux.  Preemption notifiers allow us to 
have a lightweight exit path, and mmu notifiers allow the Linux mmu to 
control the kvm mmu.  And in fact mmu notifiers have proven useful to 
device drivers.

It also works the other way around; for example work on cpu controllers 
will benefit kvm, and the real-time scheduler will also apply to kvm 
guests.  In fact many scheduler and memory management features 
immediately apply to kvm, usually without any need for integration.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/