linux-kernel - RE: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") for Linux

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7cb22078-f200-45e3-a265-10cce2ae8224@default>
Date:	Thu, 9 Jul 2009 15:34:39 -0700 (PDT)
From:	Dan Magenheimer <dan.magenheimer@...cle.com>
To:	Anthony Liguori <anthony@...emonkey.ws>
Cc:	Rik van Riel <riel@...hat.com>, linux-kernel@...r.kernel.org,
	npiggin@...e.de, akpm@...l.org, jeremy@...p.org,
	xen-devel@...ts.xensource.com, tmem-devel@....oracle.com,
	alan@...rguk.ukuu.org.uk, linux-mm@...ck.org,
	kurt.hackel@...cle.com, Rusty Russell <rusty@...tcorp.com.au>,
	dave.mccracken@...cle.com, Marcelo Tosatti <mtosatti@...hat.com>,
	sunil.mushran@...cle.com, Avi Kivity <avi@...hat.com>,
	Schwidefsky <schwidefsky@...ibm.com>, chris.mason@...cle.com,
	Balbir Singh <balbir@...ux.vnet.ibm.com>
Subject: RE: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") for Linux

> > If it guesses wrong and overcommits too aggressively,
> > the hypervisor must swap some memory to a "hypervisor
> > swap disk" (which btw has some policy challenges).
> > IMHO this is more of a "mainframe" model.
> 
> No, not at all.  A guest marks a page as being "volatile", 
> which tells 
> the hypervisor it never needs to swap that page.  It can discard it 
> whenever it likes.
> 
> If the guest later tries to access that page, it will get a special 
> "discard fault".  For a lot of types of memory, the discard fault 
> handler can then restore that page transparently to the code that 
> generated the discard fault.

But this means that either the content of that page must have been
preserved somewhere or the discard fault handler has sufficient
information to go back and get the content from the source (e.g.
the filesystem).  Or am I misunderstanding?

With tmem, the equivalent of the "failure to access a discarded page"
is inline and synchronous, so if the tmem access "fails", the
normal code immediately executes.

> AFAICT, ephemeral tmem has the exact same characteristics as volatile 
> CMM2 pages.  The difference is that tmem introduces an API to 
> explicitly 
> manage this memory behind a copy interface whereas CMM2 uses 
> hinting and 
> a special fault handler to allow any piece of memory to be marked in 
> this way.
> :
> I don't really agree with your analysis of CMM2.  We can map CMM2 
> operations directly to ephemeral tmem interfaces so tmem is a 
> subset of CMM2, no?

Not really.  I suppose one *could* use tmem that way, immediately
writing every page read from disk into tmem, though that would
probably cause some real coherency challenges.  But the patch as
proposed only puts ready-to-be-replaced pages (as determined by
Linux's PFRA) into ephemeral tmem.

The two services provided to Linux (in the proposed patch) by
tmem are:

1) "I have a page of memory that I'm about to throw away because
    I'm not sure I need it any more and I have a better use for
    that pageframe right now.  Mr Tmem might you have someplace
    you can squirrel it away for me in case I need it again?
    Oh, and by the way, if you can't or you lose it, no big deal
    as I can go get it from disk if I need to."
2) "I'm out of memory and have to put this page somewhere.  Mr
    Tmem, can you take it?  But if you do take it, you have to
    promise to give it back when I ask for it!  If you can't
    promise, never mind, I'll find something else to do with it."

> > In other words, CMM2, despite its name, is more of a
> > "subservient" memory management system (Linux is
> > subservient to the hypervisor) and tmem is more
> > collaborative (Linux and the hypervisor share the
> > responsibilities and the benefits/costs).
> 
> What's appealing to me about CMM2 is that it doesn't change the guest 
> semantically but rather just gives the VMM more information about how 
> the VMM is using it's memory.  This suggests that it allows greater 
> flexibility in the long term to the VMM and more importantly, 
> provides an easier implementation across a wide range of guests.

I suppose changing Linux to utilize the two tmem services
as described above is a semantic change.  But to me it
seems no more of a semantic change than requiring a new
special page fault handler because a page of memory might
disappear behind the OS's back.

But IMHO this is a corollary of the fundamental difference.  CMM2's
is more the "VMware" approach which is that OS's should never have
to be modified to run in a virtual environment.  (Oh, but maybe
modified just slightly to make the hypervisor a little less
clueless about the OS's resource utilization.)  Tmem asks: If an
OS is going to often run in a virtualized environment, what
can be done to share the responsibility for resource management
so that the OS does what it can with the knowledge that it has
and the hypervisor can most flexibly manage resources across
all the guests?  I do agree that adding an additional API
binds the user and provider of the API less flexibly then without
the API, but as long as the API is optional (as it is for both
tmem and CMM2), I don't see why CMM2 provides more flexibility.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/