[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FEC9181.9060000@sandia.gov>
Date: Thu, 28 Jun 2012 11:16:49 -0600
From: "Jim Schutt" <jaschut@...dia.gov>
To: "Rik van Riel" <riel@...hat.com>
cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
"Mel Gorman" <mel@....ul.ie>, kamezawa.hiroyu@...fujitsu.com,
minchan@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH -mm] mm: have order>0 compaction start off where it
left
On 06/27/2012 09:37 PM, Rik van Riel wrote:
> Order> 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
>
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
>
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
>
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
>
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order> 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
>
> Forced full (order == -1) compactions are left alone.
>
> Reported-by: Jim Schutt<jaschut@...dia.gov>
> Signed-off-by: Rik van Riel<riel@...hat.com>
Tested-by: Jim Schutt<jaschut@...dia.gov>
Please let me know if you further refine this patch
and would like me to test it with my workload.
> ---
> CAUTION: due to the time of day, I have only COMPILE tested this code
>
> include/linux/mmzone.h | 4 ++++
> mm/compaction.c | 25 +++++++++++++++++++++++--
> mm/internal.h | 1 +
> mm/page_alloc.c | 4 ++++
> 4 files changed, 32 insertions(+), 2 deletions(-)
This patch is working great for me.
FWIW here's a typical vmstat report, after ~20 minutes of my Ceph load:
2012-06-28 10:59:16.887-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
23 21 0 393128 480 36883448 0 0 8 49583 90 273 6 25 58 11 0
6 18 0 397892 480 36912832 0 0 281 2293321 203321 168790 11 43 22 24 0
17 23 0 394540 480 36921356 0 0 262 2227505 202744 163158 11 45 20 23 0
25 17 0 359404 480 36972884 0 0 205 2243941 201087 167874 11 42 23 24 0
21 20 0 367400 480 36934416 0 0 232 2310577 200666 156693 12 50 17 22 0
12 18 0 378048 480 36890624 0 0 232 2235455 196480 165692 11 44 22 24 0
17 18 0 372444 480 36874484 0 0 280 2185592 195885 168416 11 43 24 23 0
51 16 0 372760 480 36841148 0 0 245 2211135 195711 158012 11 46 23 20 0
23 17 0 375272 480 36847292 0 0 228 2323708 207079 164988 12 49 19 20 0
10 26 0 373540 480 36889240 0 0 341 2290586 201708 167954 11 46 19 23 0
44 14 0 303828 480 37020940 0 0 302 2180893 199958 168619 11 40 23 26 0
24 14 0 359320 480 36970272 0 0 345 2173978 197097 163760 11 47 22 20 0
32 19 0 355744 480 36917372 0 0 267 2276251 200123 167776 11 46 19 23 0
34 19 0 360824 480 36900032 0 0 259 2252057 200942 170912 11 43 21 25 0
13 17 0 361288 480 36919360 0 0 253 2149189 188426 170940 10 40 27 23 0
15 16 0 341828 480 36883988 0 0 317 2272817 205203 173732 11 48 19 21 0
Also FWIW, here's a typical "perf top" report with the patch applied:
PerfTop: 17575 irqs/sec kernel:80.4% exact: 0.0% [1000Hz cycles], (all, 24 CPUs)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ________________________________________________________________________________________
27583.00 11.6% copy_user_generic_string /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
18387.00 7.8% __crc32c_le /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
17264.00 7.3% _raw_spin_lock_irqsave /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
13890.00 5.9% ceph_crc32c_le /usr/bin/ceph-osd
5952.00 2.5% __copy_user_nocache /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
4663.00 2.0% memmove /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
3141.00 1.3% _raw_spin_lock /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2939.00 1.2% rb_prev /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2933.00 1.2% clflush_cache_range /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2586.00 1.1% __list_del_entry /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2357.00 1.0% intel_idle /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2168.00 0.9% __set_page_dirty_nobuffers /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2110.00 0.9% get_pageblock_flags_group /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2103.00 0.9% set_page_dirty /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2090.00 0.9% futex_wake /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1959.00 0.8% __memcpy /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1696.00 0.7% generic_bin_search /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
1628.00 0.7% btree_set_page_dirty /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
1606.00 0.7% _raw_spin_unlock_irqrestore /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1516.00 0.6% map_private_extent_buffer /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
1481.00 0.6% futex_requeue /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1447.00 0.6% isolate_migratepages_range /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1365.00 0.6% __schedule /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1361.00 0.6% memcpy /lib64/libc-2.12.so
1263.00 0.5% trace_hardirqs_off /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1238.00 0.5% tg_load_down /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1220.00 0.5% move_freepages_block /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1198.00 0.5% find_iova /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1139.00 0.5% process_responses /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/drivers/net/ethernet/chelsio/cxgb4/cxgb4.ko
So far I've run a total of ~20 TB of data over fifty minutes
or so through 12 machines running this patch; no hint of
trouble, great performance.
Without this patch I would typically start having trouble
after just a few minutes of this load.
Thanks!
-- Jim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists