[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4F4C4400.6070302@gmail.com>
Date: Mon, 27 Feb 2012 22:03:28 -0500
From: John Moser <john.r.moser@...il.com>
To: linux-kernel@...r.kernel.org
Subject: What do you do when swap is faster than disk?
I got into a weird situation today.
I limited my system RAM and used zram to make a swap device, then put up
memory pressure. What happened was I got 50MB of disk cache going on
and an -extremely- slow system with lots and lots and LOTS of hard disk
activity.
I found that around 130MB the system was fine, and around 200MB it was
extremely fast. I also found that it's extremely difficult to make the
OS keep 200MB of disk cache around on 2GB of RAM when you have 600MB
swapped out.
That raises some questions about if the tunables for this are adequate.
I can't very well set vm.swappiness to 150, and even at 100 it's not
really helpful. I mean when you have more than a quarter of your RAM in
swap, the tunable seems to have almost no impact.
It's also come to my mind that the kernel could, possibly, attempt some
speed testing against the devices and determine how fast they are. This
could be as simple as worrying about it under pressure, splitting up
swap among all swap devices and working out throughput and latency.
Then prioritize them such that, under pressure, very old data in a very
fast swap device gets swapped out to a slower swap device.
Disk cache could be ranked against the whole thing too to decide just
how important disk cache is being--and how much time is being spent
mucking about with re-loading flushed cache versus swapping. You could
keep some information about what was in RAM before aside, and push it
out as it gets old--a particular area of cache 40MB wide was flushed,
there's a 16 byte structure somewhere in RAM that makes note of that.
After more time has been spent flushing swap than it takes to read in
that 40MB, drop it. If it's read back in, make note that too much disk
cache flushing is happening and not enough swapping is going on.
This is more complex than it sounds, though, because you also have to
consider reading things into cache. Eventually you have to invalidate
disk cache to make room for more cache, after all. That or swap out
even more. So yes I understand this is hard.
Just thought I'd mention that the problem seems to be more complex than
it's credited for at the moment. (In my test case, simply locking 200MB
for disk cache would have been fine... much better than what actually
happened!)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists