lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 2 Nov 2011 14:14:16 -0700 (PDT)
From:	Dan Magenheimer <dan.magenheimer@...cle.com>
To:	Rik van Riel <riel@...hat.com>
Cc:	Andrea Arcangeli <aarcange@...hat.com>,
	Pekka Enberg <penberg@...nel.org>,
	Cyclonus J <cyclonusj@...il.com>,
	Sasha Levin <levinsasha928@...il.com>,
	Christoph Hellwig <hch@...radead.org>,
	David Rientjes <rientjes@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Konrad Wilk <konrad.wilk@...cle.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Seth Jennings <sjenning@...ux.vnet.ibm.com>, ngupta@...are.org,
	Chris Mason <chris.mason@...cle.com>, JBeulich@...ell.com,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Jonathan Corbet <corbet@....net>
Subject: RE: [GIT PULL] mm: frontswap (for 3.2 window)

> From: Rik van Riel [mailto:riel@...hat.com]
> Subject: Re: [GIT PULL] mm: frontswap (for 3.2 window)
> 
> On 10/31/2011 07:36 PM, Dan Magenheimer wrote:
> >> From: Andrea Arcangeli [mailto:aarcange@...hat.com]
> 
> >>> real work to do instead and (2) that vmexit/vmenter is horribly
> >>
> >> Sure the CPU has another 1000 VM to schedule. This is like saying
> >> virtio-blk isn't needed on desktop virt becauase the desktop isn't
> >> doing much I/O. Absurd argument, there are another 1000 desktops doing
> >> I/O at the same time of course.
> >
> > But this is truly different, I think at least for the most common
> > cases, because the guest is essentially out of physical memory if it
> > is swapping.  And the vmexit/vmenter (I assume, I don't really
> > know KVM) gives the KVM scheduler the opportunity to schedule
> > another of those 1000 VMs if it wishes.
> 
> I believe the problem Andrea is trying to point out here is
> that the proposed API cannot handle a batch of pages to be
> pushed into frontswap/cleancache at one time.

That wasn't the part of Andrea's discussion I meant, but I
am getting foggy now, so let's address your point rather
than mine.
 
> Even if the current back-end implementations are synchronous
> and can only do one page at a time, I believe it would still
> be a good idea to have the API able to handle a vector with
> a bunch of pages all at once.
> 
> That way we can optimize the back-ends as required, at some
> later point in time.
> 
> If enough people start using tmem, such bottlenecks will show
> up at some point :)

It occurs to me that batching could be done locally without
changing the in-kernel "API" (i.e. frontswap_ops)... the
guest-side KVM tmem-backend-driver could do the compression
into guest-side memory and make a single
hypercall=vmexit/vmenter whenever it has collected enough for
a batch. The "get" and "flush" would have to search this guest-side
local cache and, if not local, make a hypercall.

This is more or less what RAMster does, except it (currently)
still transmits the "batch" one (pre-compressed) page at a time.

And, when I think about it deeper (with my currently admittedly
fried brain), this may even be the best way to do batching
anyway.  I can't think offhand where else you would put
a "put batch" hook in the swap subsystem because I think
the current swap subsystem batching code only works with
adjacent "entry" numbers.

And, one more thing occurs to me then... this shows the KVM
"ABI" (hypercall) is not constrained by the existing Xen
ABI.  It can be arbitrarily more functional.

/me gets hand slapped remotely from Oracle HQ ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ