linux-kernel - Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4BD43182.1040508@redhat.com>
Date:	Sun, 25 Apr 2010 15:11:46 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Dan Magenheimer <dan.magenheimer@...cle.com>
CC:	linux-kernel@...r.kernel.org, linux-mm@...ck.org, jeremy@...p.org,
	hugh.dickins@...cali.co.uk, ngupta@...are.org, JBeulich@...ell.com,
	chris.mason@...cle.com, kurt.hackel@...cle.com,
	dave.mccracken@...cle.com, npiggin@...e.de,
	akpm@...ux-foundation.org, riel@...hat.com
Subject: Re: Frontswap [PATCH 0/4] (was Transcendent Memory): overview

On 04/25/2010 03:30 AM, Dan Magenheimer wrote:
>>>> I see.  So why not implement this as an ordinary swap device, with a
>>>> higher priority than the disk device?  this way we reuse an API and
>>>> keep
>>>> things asynchronous, instead of introducing a special purpose API.
>>>>
>>>>          
>>> Because the swapping API doesn't adapt well to dynamic changes in
>>> the size and availability of the underlying "swap" device, which
>>> is very useful for swap to (bare-metal) hypervisor.
>>>        
>> Can we extend it?  Adding new APIs is easy, but harder to maintain in
>> the long term.
>>      
> Umm... I think the difference between a "new" API and extending
> an existing one here is a choice of semantics.  As designed, frontswap
> is an extremely simple, only-very-slightly-intrusive set of hooks that
> allows swap pages to, under some conditions, go to pseudo-RAM instead
> of an asynchronous disk-like device.  It works today with at least
> one "backend" (Xen tmem), is shipping today in real distros, and is
> extremely easy to enable/disable via CONFIG or module... meaning
> no impact on anyone other than those who choose to benefit from it.
>
> "Extending" the existing swap API, which has largely been untouched for
> many years, seems like a significantly more complex and error-prone
> undertaking that will affect nearly all Linux users with a likely long
> bug tail.  And, by the way, there is no existence proof that it
> will be useful.
>
> Seems like a no-brainer to me.
>    

My issue is with the API's synchronous nature.  Both RAM and more exotic 
memories can be used with DMA instead of copying.  A synchronous 
interface gives this up.

>> Ok.  For non traditional RAM uses I really think an async API is
>> needed.  If the API is backed by a cpu synchronous operation is fine,
>> but once it isn't RAM, it can be all kinds of interesting things.
>>      
> Well, we shall see.  It may also be the case that the existing
> asynchronous swap API will work fine for some non traditional RAM;
> and it may also be the case that frontswap works fine for some
> non traditional RAM.  I agree there is fertile ground for exploration
> here.  But let's not allow our speculation on what may or may
> not work in the future halt forward progress of something that works
> today.
>    

Let's not allow the urge to merge prevent us from doing the right thing.

>
>    
>> Note that even if you do give the page to the guest, you still control
>> how it can access it, through the page tables.  So for example you can
>> easily compress a guest's pages without telling it about it; whenever
>> it
>> touches them you decompress them on the fly.
>>      
> Yes, at a much larger more invasive cost to the kernel.  Frontswap
> and cleancache and tmem are all well-layered for a good reason.
>    

No need to change the kernel at all; the hypervisor controls the page 
tables.

>> Swap has no timing
>> constraints, it is asynchronous and usually to slow devices.
>>      
> What I was referring to is that the existing swap code DOES NOT
> always have the ability to collect N scattered pages before
> initiating an I/O write suitable for a device (such as an SSD)
> that is optimized for writing N pages at a time.  That is what
> I meant by a timing constraint.  See references to page_cluster
> in the swap code (and this is for contiguous pages, not scattered).
>    

I see.  Given that swap-to-flash will soon be way more common than 
frontswap, it needs to be solved (either in flash or in the swap code).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/