[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BDDACF5.90601@redhat.com>
Date: Sun, 02 May 2010 19:48:53 +0300
From: Avi Kivity <avi@...hat.com>
To: Dan Magenheimer <dan.magenheimer@...cle.com>
CC: ngupta@...are.org, Jeremy Fitzhardinge <jeremy@...p.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>,
Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, hugh.dickins@...cali.co.uk,
JBeulich@...ell.com, chris.mason@...cle.com,
kurt.hackel@...cle.com, dave.mccracken@...cle.com, npiggin@...e.de,
akpm@...ux-foundation.org, riel@...hat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview
On 05/02/2010 07:06 PM, Dan Magenheimer wrote:
>>> NO! Frontswap on Xen+tmem never *never* _never_ NEVER results
>>> in host swapping. Host swapping is evil. Host swapping is
>>> the root of most of the bad reputation that memory overcommit
>>> has gotten from VMware customers. Host swapping can't be
>>> avoided with some memory overcommit technologies (such as page
>>> sharing), but frontswap on Xen+tmem CAN and DOES avoid it.
>>>
>> Why host-level swapping is evil? In KVM case, VM is just another
>> process and host will just swap out pages using the same LRU like
>> scheme as with any other process, AFAIK.
>>
> The first problem is that you are simulating a fast resource
> (RAM) with a resource that is orders of magnitude slower with
> NO visibility to the user that suffers the consequences. A good
> analogy (and no analogy is perfect) is if Linux discovers a 16MHz
> 80286 on a serial card in addition to the 32 3GHz cores on a
> Nehalem box and, whenever the 32 cores are all busy, randomly
> schedules a process on the 80286, while recording all CPU usage
> data as if the 80286 is a "real" processor.... "Hmmm... why
> did my compile suddenly run 100 times slower?"
>
It's bad, but it's better than ooming.
The same thing happens with vcpus: you run 10 guests on one core, if
they all wake up, your cpu is suddenly 10x slower and has 30000x
interrupt latency (30ms vs 1us, assuming 3ms timeslices). Your disks
become slower as well.
It's worse with memory, so you try to swap as a last resort. However,
swap is still faster than a crashed guest.
> The second problem is "double swapping": A guest may choose
> a page to swap to "guest swap", but invisibly to the guest,
> the host first must fetch it from "host swap". (This may
> seem like it is easy to avoid... it is not and happens more
> frequently than you might think.)
>
True. In fact when the guest and host use the same LRU algorithm, it
becomes even likelier. That's one of the things CMM2 addresses.
> Third, host swapping makes live migration much more difficult.
> Either the host swap disk must be accessible to all machines
> or data sitting on a local disk MUST be migrated along with
> RAM (which is not impossible but complicates live migration
> substantially).
kvm does live migration with swapping, and has no special code to
integrate them.
> Last I checked, VMware does not allow
> page-sharing and live migration to both be enabled for the
> same host.
>
Don't know about vmware, but kvm supports page sharing, swapping, and
live migration simultaneously.
> If you talk to VMware customers (especially web-hosting services)
> that have attempted to use overcommit technologies that require
> host-swapping, you will find that they quickly become allergic
> to memory overcommit and turn it off. The end users (users of
> the VMs that inexplicably grind to a halt) complain loudly.
> As a result, RAM has become a bottleneck in many many systems,
> which ultimately reduces the utility of servers and the value
> of virtualization.
>
Choosing the correct overcommit ratio is certainly not an easy task.
However, just hoping that memory will be available when you need it is
not a good solution.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists