linux-kernel - RE: [GIT PULL] mm: frontswap (for 3.2 window)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ac142d4-a4ca-4a24-bf0b-69a90bd1d1a0@default>
Date:	Sun, 30 Oct 2011 12:18:56 -0700 (PDT)
From:	Dan Magenheimer <dan.magenheimer@...cle.com>
To:	John Stoffel <john@...ffel.org>
Cc:	Johannes Weiner <jweiner@...hat.com>,
	Pekka Enberg <penberg@...nel.org>,
	Cyclonus J <cyclonusj@...il.com>,
	Sasha Levin <levinsasha928@...il.com>,
	Christoph Hellwig <hch@...radead.org>,
	David Rientjes <rientjes@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Konrad Wilk <konrad.wilk@...cle.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Seth Jennings <sjenning@...ux.vnet.ibm.com>, ngupta@...are.org,
	Chris Mason <chris.mason@...cle.com>, JBeulich@...ell.com,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Jonathan Corbet <corbet@....net>
Subject: RE: [GIT PULL] mm: frontswap (for 3.2 window)

> From: John Stoffel [mailto:john@...ffel.org]
> Dan> Thanks for taking the time to read the LWN article and sending
> Dan> some feedback.  I admit that, after being immersed in the topic
> Dan> for three years, it's difficult to see it from the perspective of
> Dan> a new reader, so I apologize if I may have left out important
> Dan> stuff.  I hope you'll take the time to read this long reply.
> 
> Will do.  But I'm not the person you need to convince here about the
> usefulness of this code and approach, it's the core VM developers,

True, but you are the one providing useful suggestions while
the core VM developers are mostly silent (except for saying things
like "don't like it much").  So thank you for your feedback
and for taking the time to provide it and for indulging my replies.

I/we will need to act on your suggestions, but I need to
answer a couple of points/questions you've raised.

> since they're the ones who will have to understand this stuff and know
> how to maintain it.  And keeping this maintainable is a key goal.

Absolutely agree.  Count the number of frontswap lines that affect
the current VM core code and note also how they are very clearly
identified.  It really is a very VERY small impact to the core VM
code (e.g. in the files swapfile.c and page_io.c).

(And it's worth noting, and I'm not arguing that it is conclusive,
just relevant, that my company has stood up and claimed responsibility
to maintain it.)

> Ok, so why not just a targetted swap compression function instead?
> Why is your method superior?

The designer/implementor of zram (which is the closest thing to
"targetted swap compression" in the kernel today) has stated
elsewhere on this thread that frontswap has advantages
over his own zram code.

And the frontswap patchset (did I mention how small the impact is?)
provides a lot more than just a foundation for compression (zcache).

> But that's besides the point.  How much overhead does TMEM incur when
> it's not being used, but when it's avaiable?

This is answered in frontswap.txt in the patchset, but:

ZERO overhead if CONFIG_FRONTSWAP=n.  All the hooks compile into no-ops.

If CONFIG_FRONTSWAP=y and no "tmem backend" registers to use it at
runtime, the overhead is one "compare pointer against NULL" for
every page actually swapped in or out, which is about as close to ZERO
overhead as any code can be.

If CONFIG_FRONTSWAP=y AND a "tmem backend" does register, the
answer depends on which tmem backend and what it is doing (and
yes I agree more numbers are needed), but the overhead is
incurred only in the case where a page would otherwise have
actually been swapped in or out and can replace the horrible
cost of swapping pages.

> Dan> Frontswap is the last missing piece.  Why so much resistance?
> 
> Because you haven't sold it well with numbers to show how much
> overhead it has?
>
> I'm being negative because I see no reason to use it.  And because I
> think you can do a better job of selling it and showing the benefits
> with real numbers.

In your environment where RAM is essentially infinite, and swapping
never occurs, I agree there would be no reason for you to enable it.
In which case there is no overhead to you.

Received loud and clear on the "need more real numbers" though
personally I don't have any machines with more than 4GB RAM so
I won't personally be testing any EDA environments with 144GB :-}

So, in the context of "costs nothing if you don't need it and has
very VERY small core code impact", and given that various kernel
developers and real users and real distros and real products say
on this thread that they DO need it, and given that there
are "some" real numbers (for one user, Xen, and agree that some
are needed for zcache)... and assuming that the core VM developers
bother to read the documentation already provided that addresses
the above, let me ask again...

Why so much resistance?

Thanks,
Dan

Oops, one more (but I have to use the X-word)...

> Load up a XEN box, have a VM spike it's memory usage and show how TMEM
> helps.  Compare it to a non-TMEM setup with the same load.

Yep, that's what the presentation URL I provided (for Xen) measures.
Overcommitment (more VMs than otherwise could fit in the physical
RAM) AND about a 8% performance improvement on all VMs doing
a kernel compile simultaneously.  Pretty impressive.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/