lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 20 Mar 2009 19:26:06 +0100
From:	Jos Houtman <jos@...es.nl>
To:	<linux-kernel@...r.kernel.org>
Subject: Page Cache writeback too slow,   SSD/noop scheduler/ext2

Hi,

We have hit a problem where the page-cache writeback algorithm is not
keeping up.
When memory gets low this will result in very irregular performance drops.

Our setup is as follows:
30 x Quad core machine with 64GB ram.
These are single purpose machines running MySQL.
Kernel version: 2.6.28.7
A dedicated SSD drive for the ext2 database partition
Noop scheduler for the ssd drive.


The current hypothesis is as follows:
The wk_update function does not write enough dirty pages, which allows the
number of dirty pages to grow to the dirty_background limit.
When memory is low,  Œbackground_writeout() comes around and Œforcefully¹
writes dirty pages to disk.
This forced write fills the disk queue and starves read calls that MySQL is
trying to do: basically killing performance  for a few seconds.
This pattern repeats as soon as the cleared memory is filled again.

Decreasing the dirty_writeback_centisecs to 100 doesn¹t help

I don¹t know why this is, but I did some preliminary tracing using systemtap
and it seems that the majority of times wk_update calls decides to do
nothing.

Doubling /sys/block/sdb/queue/nr_requests  to 256, seems to help abit:  the
nr_dirty pages is increasing more slowly.
But I am unsure of side-effects and am afraid of increasing the starvation
problem for mysql.


I¹am very much willing to work on this issue and see it fixed, but would
like to tap into the knowledge of people here.
So: 
* Have more people seen this or simular issues?
* Is the hypothesis above a viable one?
* Suggestions/pointers for further research and statistics I should measure
to improve the understanding of this problem.



With regards,

Jos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ