[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.GSO.4.64.0805211115390.8451@westnet.com>
Date: Wed, 21 May 2008 13:34:56 -0400 (EDT)
From: Greg Smith <gsmith@...gsmith.com>
To: lkml <linux-kernel@...r.kernel.org>
Subject: PostgreSQL pgbench performance regression in 2.6.23+
PostgreSQL ships with a simple database benchmarking tool named pgbench,
in what's labeled the contrib section (in many distributions it's a
separate package from the main server/client ones). I see there's been
some work done already improving how the PostgreSQL server works under the
new scheduler (the "Poor PostgreSQL scaling on Linux 2.6.25-rc5" thread).
I wanted to provide you a different test case using pgbench that has taken
a sharp dive starting with 2.6.23, and the server improvement changes in
2.6.25 actually made this problem worse.
I think it will be easy for someone else to replicate my results and I'll
go over the exact procedure below. To start with a view of how bad the
regression is, here's a summary of the results on one system, an AMD X2
4600+ running at 2.4GHz, with a few interesting kernels. I threw in
results from Solaris 10 on this system as a nice independant reference
point. The numbers here are transactions/second (TPS) running a simple
read-only test over a 160MB data set, I took the median from 3 test runs:
Clients 2.6.9 2.6.22 2.6.24 2.6.25 Solaris
1 11173 11052 10526 10700 9656
2 18035 16352 14447 10370 14518
3 19365 15414 17784 9403 14062
4 18975 14290 16832 8882 14568
5 18652 14211 16356 8527 15062
6 17830 13291 16763 9473 15314
8 15837 12374 15343 9093 15164
10 14829 11218 10732 9057 14967
15 14053 11116 7460 7113 13944
20 13713 11412 7171 7017 13357
30 13454 11191 7049 6896 12987
40 13103 11062 7001 6820 12871
50 12311 11255 6915 6797 12858
That's the CentOS 4 2.6.9 kernel there, while the rest are stock ones I
compiled with a minimum of fiddling from the defaults (just adding support
for my SATA RAID card). You can see a major drop with the recent kernels
at high client loads, and the changes in 2.6.25 seem to have really hurt
even the low client count ones.
The other recent hardware I have here, an Intel Q6600 based system, gives
even more maddening results. On successive benchmark runs, you can watch
it break down only sometimes once you get just above 8 clients. At 10 and
15 clients, when I run it a few times, I'll sometimes get results in the
good 25-30K TPS range, while others will give the 10K slow case. It's not
a smooth drop off like in the AMD case, the results from 10-15 are really
unstable. I've attached some files with 5 quick runs at each client load
so you can see what I'm talking about. On that system I was also able to
test 2.6.26-rc2 which doesn't look all that different from 2.6.25.
All these results are running everything on the server using the default
local sockets-based interface, which is relevant in the real world because
that's how a web app hosted on the same system will talk to the database.
If I switch to connecting to the database over TCP/IP and run the pgbench
client on another system, the extra latency drops the single client case
to ~3100TPS. But the high client load cases are great--about 26K TPS at
50 clients. That result is attached as q6600-remote-2.6.25.txt, the
remote client was running 2.6.20. Since recent PostgreSQL results were
also fine with sysbench as the benchmark driver, this suggests the problem
here is actually related to the pgbench client itself and how it gets
scheduled relative to the server backends, rather than being inherent to
the server.
Replicating the test results
----------------------------
Onto replicating my results, which I hope works because I don't have too
much time to test potential fixed kernels myself (I was supposed to be
working on the PostgreSQL code until this sidetracked me). I'll assume
you can get the basic database going, if anybody needs help with that let
me know. There is one server tunable that needs to be adjusted before you
can get useful PostgreSQL benchmarks from this (and many other) tests.
In the root of the database directory, there will be a file named
postgresql.conf. Edit that and changed the setting for the shared_buffers
parameter to 256MB to mimic my test setup. You may need to bump up shmmax
(this is the one list where I'm happy I don't have to explain what that
means!). Restart the server and check the logs to make sure it came back
up, if shmmax is too low it will just tell you how big it needs to be and
not start.
Now the basic procedure to run this test is:
-dropdb pgbench (if it's already there)
-createdb pgbench
-pgbench -i -s 10 pgbench (makes about a 160MB database)
-pgbench -S -c <clients> -t 10000 pgbench
The idea is that you'll have a large enough data set to not fit in L2
cache, but small enough that it all fits in PostgreSQL's dedicated memory
(shared_buffers) so that it never has to ask the kernel to read a block.
The "pgbench -i" initialization step will populate the server's memory and
while that's all written to disk, it should stay in memory afterwards as
well. That's why I use this as a general CPU/L2/memory test as viewed
from a PostgreSQL context, and as you can see from my results with this
problem it's pretty sensitive to whether your setup is optimal or not.
To make this easier to run against a range of client loads, I've attached
a script (selecttest.sh) that does the last two steps in the above.
That's what I used to generate all the results I've attached. If you've
got the database setup such that you can run the psql client and pgbench
is in your path, you should just be able to run that script and have it
give you a set of results in a couple of minutes. You can adjust which
client loads and how many times it runs each by editing the script.
Addendum: how pgbench works
----------------------------
pgbench works off "command scripts", which are a series of SQL commands
with some extra benchmarking features implemented as a really simple
programming language. For example, the SELECT-only test run above, what
you get when passing -S to pgbench, is implemented like this:
\\set naccounts 100000 * :scale
\\setrandom aid 1 :naccounts
SELECT abalance FROM accounts WHERE aid = :aid;
Here :scale is detected automatically by doing a count of a table in the
database.
The pgbench client runs as a single process. When pgbench starts, it
iterates over each client, parsing the script until it hits a line that
needs to be sent to the server. At that point, it issues that command as
an asynchronous request, then returns to the main loop. Once every client
is primed with a command, it enters a loop where it just waits for
responses from them.
The main loop has all the open client connections in a fd_set. Each time
a select() on that set says there's been a response to at least one of the
clients from the server, it sweeps through all the clients and feeds the
next script line to any that are ready for one. This proceeds until the
target transaction count is reached.
This design is recognized as being only useful for smallish client loads.
The results start dropping off very hard even on a fast machine with >100
simulated clients as the single pgbench process struggles to respond to
everyone who is ready on each pass through all the clients who got
responses. This makes pgbench particularly unsuitable for testing on
systems with a large number of CPUs. I find pgbench just can't keep up
with the useful number of clients possible somewhere between 8 and 16
cores. I'm hoping the PostgreSQL community can rewrite it in a more
efficient way before the next release comes out now that such hardware is
starting to show up more running this database. If that's the only way to
resolve the issue outlined in this message, that's not intolerable, but a
kernel fix would obviously be better.
I wanted to submit this here regardless because I'd really like for
current versions to not have a big regression just because they were using
a newer kernel, and it provides an interesting scheduler test case to add
to the mix. The fact that earlier Linux kernels and alternate ones like
Solaris give pretty consistant results here says this programming approach
isn't impossible for a kernel to support well, I just don't think this
specific type of load has been considered in the test cases for the new
scheduler yet.
--
* Greg Smith gsmith@...gsmith.com http://www.gregsmith.com Baltimore, MD
View attachment "selecttest.sh" of type "TEXT/PLAIN" (503 bytes)
View attachment "q6600-remote-2.6.25.txt" of type "TEXT/PLAIN" (711 bytes)
View attachment "q6600-results-2.6.25.txt" of type "TEXT/PLAIN" (1064 bytes)
View attachment "q6600-results-2.6.24.txt" of type "TEXT/PLAIN" (1069 bytes)
View attachment "q6600-results-2.6.22.txt" of type "TEXT/PLAIN" (1085 bytes)
View attachment "q6600-results-2.6.26-rc2.txt" of type "TEXT/PLAIN" (1084 bytes)
Powered by blists - more mailing lists