linux-kernel - Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 03 May 2010 12:39:20 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Dan Magenheimer <dan.magenheimer@...cle.com>
CC:	ngupta@...are.org, Jeremy Fitzhardinge <jeremy@...p.org>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, hugh.dickins@...cali.co.uk,
	JBeulich@...ell.com, chris.mason@...cle.com,
	kurt.hackel@...cle.com, dave.mccracken@...cle.com, npiggin@...e.de,
	akpm@...ux-foundation.org, riel@...hat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

On 05/02/2010 08:22 PM, Dan Magenheimer wrote:
>> It's bad, but it's better than ooming.
>>
>> The same thing happens with vcpus: you run 10 guests on one core, if
>> they all wake up, your cpu is suddenly 10x slower and has 30000x
>> interrupt latency (30ms vs 1us, assuming 3ms timeslices).  Your disks
>> become slower as well.
>>
>> It's worse with memory, so you try to swap as a last resort.  However,
>> swap is still faster than a crashed guest.
>>      
> Your analogy only holds when the host administrator is either
> extremely greedy or stupid.

10x vcpu is reasonable in some situations (VDI, powersave at night).  
Even a 2x vcpu overcommit will cause a 10000x interrupt latency degradation.

> My analogy only requires some
> statistical bad luck: Multiple guests with peaks and valleys
> of memory requirements happen to have their peaks align.
>    

Not sure I understand.

>>> Third, host swapping makes live migration much more difficult.
>>> Either the host swap disk must be accessible to all machines
>>> or data sitting on a local disk MUST be migrated along with
>>> RAM (which is not impossible but complicates live migration
>>> substantially).
>>>        
>> kvm does live migration with swapping, and has no special code to
>> integrate them.
>>   :
>> Don't know about vmware, but kvm supports page sharing, swapping, and
>> live migration simultaneously.
>>      
> Hmmm... I'll bet I can break it pretty easily.  I think the
> case you raised that you thought would cause host OOM'ing
> will cause kvm live migration to fail.
>
> Or maybe not... when a guest is in the middle of a live migration,
> I believe (in Xen), the entire guest memory allocation (possibly
> excluding ballooned-out pages) must be simultaneously in RAM briefly
> in BOTH the host and target machine.  That is, live migration is
> not "pipelined".  Is this also true of KVM?

No.  The entire guest address space can be swapped out on the source and 
target, less the pages being copied to or from the wire, and pages 
actively accessed by the guest.  Of course performance will suck if all 
memory is swapped out.

> If so, your
> statement above is just waiting a corner case to break it.
> And if not, I expect you've got fault tolerance issues.
>    

Not that I'm aware of.

>>> If you talk to VMware customers (especially web-hosting services)
>>> that have attempted to use overcommit technologies that require
>>> host-swapping, you will find that they quickly become allergic
>>> to memory overcommit and turn it off.  The end users (users of
>>> the VMs that inexplicably grind to a halt) complain loudly.
>>> As a result, RAM has become a bottleneck in many many systems,
>>> which ultimately reduces the utility of servers and the value
>>> of virtualization.
>>>        
>> Choosing the correct overcommit ratio is certainly not an easy task.
>> However, just hoping that memory will be available when you need it is
>> not a good solution.
>>      
> Choosing the _optimal_ overcommit ratio is impossible without a
> prescient knowledge of the workload in each guest.  Hoping memory
> will be available is certainly not a good solution, but if memory
> is not available guest swapping is much better than host swapping.
>    

You cannot rely on guest swapping.

> And making RAM usage as dynamic as possible and live migration
> as easy as possible are keys to maximizing the benefits (and
> limiting the problems) of virtualization.
>    

That is why you need overcommit.  You make things dynamic with page 
sharing and ballooning and live migration, but at some point you need a 
failsafe fallback.  The only failsafe fallback I can see (where the host 
doesn't rely on guests) is swapping.

As far as I can tell, frontswap+tmem increases the problem.  You loan 
the guest some memory without the means to take it back, this increases 
memory pressure on the host.  The result is that if you want to avoid 
swapping (or are unable to) you need to undercommit host resources.  
Instead of sum(guest mem) + reserve < (host mem), you need sum(guest mem 
+ committed tmem) + reserve < (host mem).  You need more host memory, or 
less guests, or to be prepared to swap if the worst happens.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/