lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BDB1CA1.1000006@redhat.com>
Date:	Fri, 30 Apr 2010 21:08:33 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Dan Magenheimer <dan.magenheimer@...cle.com>
CC:	Dave Hansen <dave@...ux.vnet.ibm.com>, Pavel Machek <pavel@....cz>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, jeremy@...p.org,
	hugh.dickins@...cali.co.uk, ngupta@...are.org, JBeulich@...ell.com,
	chris.mason@...cle.com, kurt.hackel@...cle.com,
	dave.mccracken@...cle.com, npiggin@...e.de,
	akpm@...ux-foundation.org, riel@...hat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

On 04/30/2010 07:43 PM, Dan Magenheimer wrote:
>> Given that whenever frontswap fails you need to swap anyway, it is
>> better for the host to never fail a frontswap request and instead back
>> it with disk storage if needed.  This way you avoid a pointless vmexit
>> when you're out of memory.  Since it's disk backed it needs to be
>> asynchronous and batched.
>>
>> At this point we're back with the ordinary swap API.  Simply have your
>> host expose a device which is write cached by host memory, you'll have
>> all the benefits of frontswap with none of the disadvantages, and with
>> no changes to guest .
>>      
> I think you are making a number of possibly false assumptions here:
> 1) The host [the frontswap backend may not even be a hypervisor]
>    

True.  My remarks only apply to frontswap-to-hypervisor, for internally 
consumed frontswap the situation is different.

> 2) can back it with disk storage [not if it is a bare-metal hypervisor]
>    

So it seems a bare-metal hypervisor has less access to the bare metal 
than a non-bare-metal hypervisor?

Seriously, leave the bare-metal FUD to Simon.  People on this list know 
that kvm and Xen have exactly the same access to the hardware (well 
actually Xen needs to use privileged guests to access some of its hardware).

> 3) avoid a pointless vmexit [no vmexit for a non-VMX (e.g. PV) guest]
>    

There's still an exit.  It's much faster than a vmx/svm vmexit but still 
nontrivial.

But why are we optimizing for 5 year old hardware?

> 4) when you're out of memory [how can this be determined outside of
>     the hypervisor?]
>    

It's determined by the hypervisor, same as with tmem.  The guest swaps 
to a virtual disk, the hypervisor places the data in RAM if it's 
available, or on disk if it isn't.  Write-back caching in all its glory.

> And, importantly, "have your host expose a device which is write
> cached by host memory"... you are implying that all guest swapping
> should be done to a device managed/controlled by the host?  That
> eliminates guest swapping to directIO/SRIOV devices doesn't it?
>    

You can have multiple swap devices.

wrt SR/IOV, you'll see synchronous frontswap reduce throughput.  SR/IOV 
will swap with <1 exit/page and DMA guest pages, while frontswap/tmem 
will carry a 1 exit/page hit (even if no swap actually happens) and the 
copy cost (if it does).

The API really, really wants to be asynchronous.

> Anyway, I think we can see now why frontswap might not be a good
> match for a hosted hypervisor (KVM), but that doesn't make it
> any less useful for a bare-metal hypervisor (or TBD for in-kernel
> compressed swap and TBD for possible future pseudo-RAM technologies).
>    

In-kernel compressed swap does seem to be a good match for a synchronous 
API.  For future memory devices, or even bare-metal buzzword-compliant 
hypervisors, I disagree.  An asynchronous API is required for 
efficiency, and they'll all have swap capability sooner or later (kvm, 
vmware, and I believe xen 4 already do).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ