linux-kernel - RE: cleancache can lead to serious performance degradation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 25 Aug 2011 09:56:38 -0700 (PDT)
From:	Dan Magenheimer <dan.magenheimer@...cle.com>
To:	Nebojsa Trpkovic <trx.lists@...il.com>
Cc:	linux-kernel@...r.kernel.org, Konrad Wilk <konrad.wilk@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Seth Jennings <sjenning@...ux.vnet.ibm.com>,
	Nitin Gupta <ngupta@...are.org>
Subject: RE: cleancache can lead to serious performance degradation

> From: Konrad Rzeszutek Wilk
> I've put Dan on the CC since he is the author of it.

Thanks for forwarding, Konrad (Also thanks akpm for the offlist forward).

Hi Nebojsa --

> > I've tried using cleancache on my file server and came to conclusion
> > that my Core2 Duo 4MB L2 cache 2.33GHz CPU cannot cope with the
> > amount of data it needs to compress during heavy sequential IO when
> > cleancache/zcache are enabled.
> >
> > For an example, with cleancache enabled I get 60-70MB/s from my RAID
> > arrays and both CPU cores are saturated with system (kernel) time.
> > Without cleancache, each RAID gives me more then 300MB/s of useful
> > read throughput.

Hmmm... yes, an older processor with multiple very fast storage devices
could certainly be overstressed by all the compression.

Are your measurements on a real workload or a benchmark?  Can
you describe your configuration more (e.g. number of spindles
-- or SSDs?).  Is any swapping occurring?

In any case, please see below.

> > In the scenario of sequential reading, this drop of throughput seems
> > completely normal:
> > - a lot of data gets pulled in from disks
> > - data is processed in some non CPU-intensive way
> > - page cache fills up quickly and cleancache starts compressing
> > pages (a lot of "puts" in /sys/kernel/mm/cleancache/)
> > - these compressed cleancache pages newer get read because there are
> > a whole lot of new pages coming in every second replacing old ones
> > (practically no "succ_gets" in /sys/kernel/mm/cleancache/)
> > - CPU saturates doing useless compression, and even worse:
> > - new disk read operations are waiting for CPU to finish compression
> > and make some space in memory
> >
> > So, using cleancache in scenarios with a lot of non-random data
> > throughput can lead to very bad performance degradation.

Your analysis is correct IF compression of a page is very slow
and "a lot of data" is *really* a lot.  But...

First, I don't recommend that zcache+cleancache be used without
frontswap, because the additional memory pressure from saving
compressed clean pagecache pages may sometimes result in swapping
(and frontswap will absorb much of the performance loss).  I know
frontswap isn't officially merged yet, but that's an artifact of
the process for submitting a complex patchset (transcendent memory)
that crosses multiple subsystems.  (See https://lwn.net/Articles/454795/ 
if you're not familiar with the whole transcendent memory picture.)

Second, I believe that for any cacheing mechanism -- hardware or
software -- in the history of computing, it is possible to identify
workloads or benchmarks that render the cache counter-productive.
Zcache+cleancache is not immune to that, and benchmarks that are
intended to measure sequential disk read throughput are likely to
imply that zcache is a problem.  However, Phoronix's benchmarking
(see  http://openbenchmarking.org/result/1105276-GR-CLEANCACH02)
didn't show anything like the performance degradation you observed
even though it was also run on a Core 2 Duo, albeit it looks like
with only one spindle.

Third, zcache is relatively new and can certainly benefit from
the input of other developers.  The lzo1x compression in the kernel
is fairly slow; Seth Jennings (cc'ed) is looking into alternate
compression technologies.  Perhaps there is a better compression
choice more suitable for older-slower processors, probably with a
poorer compression ratio.  Further, zcache currently does compression
and decompression with interrupts disabled, which may be a
significant factor in the slowdowns you've observed.  This should
be fixable.  Also the policies to determine acceptance and
reclaim from cleancache are still fairly primitive, but the dynamicity
of cleancache should allow some interesting heuristics to be
explored (as you suggest below).

Fourth, cleancache usually shows a fairly low "hit rate" but still
a solid performance improvement.  I've looked at improving the hit
rate by only moving to cleancache pages that had previously been on
the active list, which would likely solve your problem, but the
patch wasn't really upstreamable and may make cleancache function
less well on other workloads and in other environments (such as
in Xen where the size of transcendent memory may greatly exceed
the size of the guest's pagecache).

> > I guess that possible workaround could be to implement some kind of
> > compression throttling valve for cleancache/zcache:
> >
> > - if there's available CPU time (idle cycles or so), then compress
> > (maybe even with low CPU scheduler priority);

Agreed, and this low-priority kernel thread ideally would also
solve the "compress while irqs disabled" problem!

> > - if there's no available CPU time, just store (or throw away) to
> > avoid IO waits;

Any ideas on how to code this test (for "no available CPU time")?

> > At least, there should be a warning in kernel help about this kind of
> > situations.

That's a good point.  I'll submit a patch updating the Kconfig
and the Documentation/vm/cleancache.txt FAQ to identify the concern.
Documentation for zcache isn't in place yet because frontswap isn't
yet merged and I don't yet want to encourage zcache use without it,
but I'll ensure a warning gets added in zcache someplace also.

If you have any more comments or questions, please cc me directly
as I'm not able to keep up with all the lkml traffic, especially
when traveling... when you posted this I was at Linuxcon 2011
talking about transcendent memory!  See:
http://events.linuxfoundation.org/events/linuxcon/magenheimer 

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/