lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Mar 2009 01:48:53 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Jos Houtman <jos@...es.nl>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Page Cache writeback too slow,   SSD/noop scheduler/ext2

On Monday 23 March 2009 03:53:29 Jos Houtman wrote:
> On 3/21/09 11:53 AM, "Andrew Morton" <akpm@...ux-foundation.org> wrote:
> > On Fri, 20 Mar 2009 19:26:06 +0100 Jos Houtman <jos@...es.nl> wrote:
> >> Hi,
> >>
> >> We have hit a problem where the page-cache writeback algorithm is not
> >> keeping up.
> >> When memory gets low this will result in very irregular performance
> >> drops.
> >>
> >> Our setup is as follows:
> >> 30 x Quad core machine with 64GB ram.
> >> These are single purpose machines running MySQL.
> >> Kernel version: 2.6.28.7
> >> A dedicated SSD drive for the ext2 database partition
> >> Noop scheduler for the ssd drive.
> >>
> >>
> >> The current hypothesis is as follows:
> >> The wk_update function does not write enough dirty pages, which allows
> >> the number of dirty pages to grow to the dirty_background limit.
> >> When memory is low,  __background_writeout() comes around and
> >> __forcefully__ writes dirty pages to disk.
> >> This forced write fills the disk queue and starves read calls that MySQL
> >> is trying to do: basically killing performance  for a few seconds. This
> >> pattern repeats as soon as the cleared memory is filled again.
> >>
> >> Decreasing the dirty_writeback_centisecs to 100 doesn__t help
> >>
> >> I don__t know why this is, but I did some preliminary tracing using
> >> systemtap and it seems that the majority of times wk_update calls
> >> decides to do nothing.
> >>
> >> Doubling /sys/block/sdb/queue/nr_requests  to 256, seems to help abit: 
> >> the nr_dirty pages is increasing more slowly.
> >> But I am unsure of side-effects and am afraid of increasing the
> >> starvation problem for mysql.
> >>
> >>
> >> I__am very much willing to work on this issue and see it fixed, but
> >> would like to tap into the knowledge of people here.
> >> So:
> >> * Have more people seen this or simular issues?
> >> * Is the hypothesis above a viable one?
> >> * Suggestions/pointers for further research and statistics I should
> >> measure to improve the understanding of this problem.
> >
> > I don't think that noop-iosched tries to do anything to prevent
> > writes-starve-reads.  Do you get better behaviour from any of the other
> > IO schedulers?
>
> I did a quick stress test and cfq does not immediately seem to hurt
> performance, although some of my colleague's have tested this in the past
> with the opposite results (which is why we use noop).
>
> But despite the scheduler, the real problem is in the writeback algorithm
> not keeping up.
> We can grow 600K dirty pages during the day, and only ~300k is flushed to
> disk during the night hours.
>
> While a quick look at the writeback algorithm let me to expect
> __wk_update()__ to flush ~1024 pages every 5 seconds, which is almost 3GB
> per hour.  It obviously does not manage to do this in our setup.
>
> I donĀ¹t believe the speed of the ssd to be the problem, running sync
> manually only takes a few minutes to flush 800K dirty pages to disk.

kupdate surely should just continue to keep trying to write back pages
so long as there are more old pages to clean, and the queue isn't
congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
is just the number to write back in a single call, but you see
nr_to_write is set to the number of dirty pages in the system.

On your system, what must be happening is more_io is not being set.
The logic in fs/fs-writeback.c might be busted.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ