[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <195127.4908.qm@web32605.mail.mud.yahoo.com>
Date: Wed, 23 Jan 2008 06:46:49 -0800 (PST)
From: Martin Knoblauch <spamtrap@...bisoft.de>
To: linux-kernel@...r.kernel.org
Cc: mike.miller@...com, iss_storagedev@...com
Subject: Performance problems when writing large files on CCISS hardware
Please CC me on replies, as I am not subscribed.
Hi,
for a while now I am having problems writing large files sequentially to EXT2 filesystems on CCISS based boxes. The problem is that writing multiple files in parallel is extremely slow compared to a single file in non-DIO mode. When using DIO, the scaling is almost "perfect". The problem manifests itself in RHEL4 kernels (2.6.9-X) and any mainline kernel up to 2.6.24-rc8.
The systems in question are HP/DL380G4 with 2 cpus, 8 GB memory, SmartArray6i (CCISS) with BBWC and 4x72GB@...rpm disks in RAID5 configuration. Environment is 64-bit RHEL4.3.
The problem can be reproduced by running 1, 2 or 3 parallel "dd" processes, or "iozone" with 1, 2 or 3 threads. Curiously, there was a period from 2.6.24-rc1 until 2.6.24-rc5 where the problem went away. It turned out that this was due to a "regression" that was "fixed" by below commit. Unfortunatelly this is not good for my systems, but it might shed some light on the underlying problem:
> #commit 81eabcbe0b991ddef5216f30ae91c4b226d54b6d
> #Author: Mel Gorman <mel@....ul.ie>
> #Date: Mon Dec 17 16:20:05 2007 -0800
> #
> # mm: fix page allocation for larger I/O segments
> #
> # In some cases the IO subsystem is able to merge requests if the
pages are
> # adjacent in physical memory. This was achieved in the allocator
by having
> # expand() return pages in physically contiguous order in
situations were a
> # large buddy was split. However, list-based anti-fragmentation
changed the
> # order pages were returned in to avoid searching in
buffered_rmqueue() for a
> # page of the appropriate migrate type.
> #
> # This patch restores behaviour of rmqueue_bulk() preserving the
physical
> # order of pages returned by the allocator without incurring
increased search
> # costs for anti-fragmentation.
> #
> # Signed-off-by: Mel Gorman <mel@....ul.ie>
> # Cc: James Bottomley <James.Bottomley@...eleye.com>
> # Cc: Jens Axboe <jens.axboe@...cle.com>
> # Cc: Mark Lord <mlord@...ox.com
> # Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> # Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> diff -urN linux-2.6.24-rc5/mm/page_alloc.c
linux-2.6.24-rc6/mm/page_alloc.c
> --- linux-2.6.24-rc5/mm/page_alloc.c 2007-12-21 04:14:11.305633890
+0000
> +++ linux-2.6.24-rc6/mm/page_alloc.c 2007-12-21 04:14:17.746985697
+0000
> @@ -847,8 +847,19 @@
> struct page *page = __rmqueue(zone, order,
migratetype);
> if (unlikely(page == NULL))
> break;
> +
> + /*
> + * Split buddy pages returned by expand() are
received here
> + * in physical page order. The page is added to the
callers and
> + * list and the list head then moves forward. From
the callers
> + * perspective, the linked list is ordered by page
number in
> + * some conditions. This is useful for IO devices
that can
> + * merge IO requests if the physical pages are
ordered
> + * properly.
> + */
> list_add(&page->lru, list);
> set_page_private(page, migratetype);
> + list = &page->lru;
> }
> spin_unlock(&zone->lock);
> return i;
>
Reverting this patch from 2.6.24-rc8 gives the good performance reported below (rc8*). So, apparently CCISS is very sensitive to the page ordering.
Here are the numbers (MB/sec) including sync-time. I compare 2.6.24-rc8 (rc8) and 2.6.24-rc8 with abore commit reverted (rc8*). Reported is the combined throughput for 1,2,3 iozone threads, for reference also the DIO numbers. Raw numbers are attached.
Test rc8 rc8*
----------------------------------------
1x3GB 56 90
1x3GB-DIO 86 86
2x1.5GB 9.5 87
2x1.5GB-DIO 80 85
3x1GB 16.5 85
3x1GB-DIO 85 85
One can see that in mainline/rc8 all non-DIO numbers are smaller than the corresponding DIO numbers, or the non-DIO numbers from rc8*. The performance for 2 and 3 threads in mainline/rc8 is just bad.
Of course I have the option to revert commit ....54b6d for my systems, but I think a more general solution would be better. If I can help tracking the real problem down, I am open for suggestions.
Cheers
Martin
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
View attachment "cciss-rc8-bad.log" of type "text/x-log" (10205 bytes)
View attachment "cciss-rc8-good.log" of type "text/x-log" (10205 bytes)
Download attachment "config-2.6.24-rc8" of type "application/octet-stream" (44328 bytes)
Powered by blists - more mailing lists