[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84144f021001040605pefbc9f7h276b796bdf931353@mail.gmail.com>
Date: Mon, 4 Jan 2010 16:05:37 +0200
From: Pekka Enberg <penberg@...helsinki.fi>
To: Mel Gorman <mel@....ul.ie>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Corrado Zoccolo <czoccolo@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [PATCH] page allocator: Reduce fragmentation in buddy allocator
by adding buddies that are merging to the tail of the free lists
On Mon, Jan 4, 2010 at 3:55 PM, Mel Gorman <mel@....ul.ie> wrote:
> From: Corrado Zoccolo <czoccolo@...il.com>
>
> In order to reduce fragmentation, this patch classifies freed pages in
> two groups according to their probability of being part of a high order
> merge. Pages belonging to a compound whose next-highest buddy is free are
> more likely to be part of a high order merge in the near future, so they
> will be added at the tail of the freelist. The remaining pages are put at
> the front of the freelist.
>
> In this way, the pages that are more likely to cause a big merge are kept
> free longer. Consequently there is a tendency to aggregate the long-living
> allocations on a subset of the compounds, reducing the fragmentation.
>
> This heuristic was testing on three machines, x86, x86-64 and ppc64 with
> 3GB of RAM in each machine. The tests were kernbench, netperf, sysbench and
> STREAM for performance and a high-order stress test for huge page allocations.
>
> KernBench X86
> Elapsed mean 374.77 ( 0.00%) 375.10 (-0.09%)
> User mean 649.53 ( 0.00%) 650.44 (-0.14%)
> System mean 54.75 ( 0.00%) 54.18 ( 1.05%)
> CPU mean 187.75 ( 0.00%) 187.25 ( 0.27%)
>
> KernBench X86-64
> Elapsed mean 94.45 ( 0.00%) 94.01 ( 0.47%)
> User mean 323.27 ( 0.00%) 322.66 ( 0.19%)
> System mean 36.71 ( 0.00%) 36.50 ( 0.57%)
> CPU mean 380.75 ( 0.00%) 381.75 (-0.26%)
>
> KernBench PPC64
> Elapsed mean 173.45 ( 0.00%) 173.74 (-0.17%)
> User mean 587.99 ( 0.00%) 587.95 ( 0.01%)
> System mean 60.60 ( 0.00%) 60.57 ( 0.05%)
> CPU mean 373.50 ( 0.00%) 372.75 ( 0.20%)
>
> Nothing notable for kernbench.
>
> NetPerf UDP X86
> 64 42.68 ( 0.00%) 42.77 ( 0.21%)
> 128 85.62 ( 0.00%) 85.32 (-0.35%)
> 256 170.01 ( 0.00%) 168.76 (-0.74%)
> 1024 655.68 ( 0.00%) 652.33 (-0.51%)
> 2048 1262.39 ( 0.00%) 1248.61 (-1.10%)
> 3312 1958.41 ( 0.00%) 1944.61 (-0.71%)
> 4096 2345.63 ( 0.00%) 2318.83 (-1.16%)
> 8192 4132.90 ( 0.00%) 4089.50 (-1.06%)
> 16384 6770.88 ( 0.00%) 6642.05 (-1.94%)*
>
> NetPerf UDP X86-64
> 64 148.82 ( 0.00%) 154.92 ( 3.94%)
> 128 298.96 ( 0.00%) 312.95 ( 4.47%)
> 256 583.67 ( 0.00%) 626.39 ( 6.82%)
> 1024 2293.18 ( 0.00%) 2371.10 ( 3.29%)
> 2048 4274.16 ( 0.00%) 4396.83 ( 2.79%)
> 3312 6356.94 ( 0.00%) 6571.35 ( 3.26%)
> 4096 7422.68 ( 0.00%) 7635.42 ( 2.79%)*
> 8192 12114.81 ( 0.00%)* 12346.88 ( 1.88%)
> 16384 17022.28 ( 0.00%)* 17033.19 ( 0.06%)*
> 1.64% 2.73%
>
> NetPerf UDP PPC64
> 64 49.98 ( 0.00%) 50.25 ( 0.54%)
> 128 98.66 ( 0.00%) 100.95 ( 2.27%)
> 256 197.33 ( 0.00%) 191.03 (-3.30%)
> 1024 761.98 ( 0.00%) 785.07 ( 2.94%)
> 2048 1493.50 ( 0.00%) 1510.85 ( 1.15%)
> 3312 2303.95 ( 0.00%) 2271.72 (-1.42%)
> 4096 2774.56 ( 0.00%) 2773.06 (-0.05%)
> 8192 4918.31 ( 0.00%) 4793.59 (-2.60%)
> 16384 7497.98 ( 0.00%) 7749.52 ( 3.25%)
>
> The tests are run to have confidence limits within 1%. Results marked with
> a * were not confident although in this case, it's only outside by small
> amounts. Even with some results that were not confident, the netperf UDP
> results were generally positive.
>
> NetPerf TCP X86
> 64 652.25 ( 0.00%)* 648.12 (-0.64%)*
> 23.80% 22.82%
> 128 1229.98 ( 0.00%)* 1220.56 (-0.77%)*
> 21.03% 18.90%
> 256 2105.88 ( 0.00%) 1872.03 (-12.49%)*
> 1.00% 16.46%
> 1024 3476.46 ( 0.00%)* 3548.28 ( 2.02%)*
> 13.37% 11.39%
> 2048 4023.44 ( 0.00%)* 4231.45 ( 4.92%)*
> 9.76% 12.48%
> 3312 4348.88 ( 0.00%)* 4396.96 ( 1.09%)*
> 6.49% 8.75%
> 4096 4726.56 ( 0.00%)* 4877.71 ( 3.10%)*
> 9.85% 8.50%
> 8192 4732.28 ( 0.00%)* 5777.77 (18.10%)*
> 9.13% 13.04%
> 16384 5543.05 ( 0.00%)* 5906.24 ( 6.15%)*
> 7.73% 8.68%
>
> NETPERF TCP X86-64
> netperf-tcp-vanilla-netperf netperf-tcp
> tcp-vanilla pgalloc-delay
> 64 1895.87 ( 0.00%)* 1775.07 (-6.81%)*
> 5.79% 4.78%
> 128 3571.03 ( 0.00%)* 3342.20 (-6.85%)*
> 3.68% 6.06%
> 256 5097.21 ( 0.00%)* 4859.43 (-4.89%)*
> 3.02% 2.10%
> 1024 8919.10 ( 0.00%)* 8892.49 (-0.30%)*
> 5.89% 6.55%
> 2048 10255.46 ( 0.00%)* 10449.39 ( 1.86%)*
> 7.08% 7.44%
> 3312 10839.90 ( 0.00%)* 10740.15 (-0.93%)*
> 6.87% 7.33%
> 4096 10814.84 ( 0.00%)* 10766.97 (-0.44%)*
> 6.86% 8.18%
> 8192 11606.89 ( 0.00%)* 11189.28 (-3.73%)*
> 7.49% 5.55%
> 16384 12554.88 ( 0.00%)* 12361.22 (-1.57%)*
> 7.36% 6.49%
>
> NETPERF TCP PPC64
> netperf-tcp-vanilla-netperf netperf-tcp
> tcp-vanilla pgalloc-delay
> 64 594.17 ( 0.00%) 596.04 ( 0.31%)*
> 1.00% 2.29%
> 128 1064.87 ( 0.00%)* 1074.77 ( 0.92%)*
> 1.30% 1.40%
> 256 1852.46 ( 0.00%)* 1856.95 ( 0.24%)
> 1.25% 1.00%
> 1024 3839.46 ( 0.00%)* 3813.05 (-0.69%)
> 1.02% 1.00%
> 2048 4885.04 ( 0.00%)* 4881.97 (-0.06%)*
> 1.15% 1.04%
> 3312 5506.90 ( 0.00%) 5459.72 (-0.86%)
> 4096 6449.19 ( 0.00%) 6345.46 (-1.63%)
> 8192 7501.17 ( 0.00%) 7508.79 ( 0.10%)
> 16384 9618.65 ( 0.00%) 9490.10 (-1.35%)
>
> There was a distinct lack of confidence in the X86* figures so I included what
> the devation was where the results were not confident. Many of the results,
> whether gains or losses were within the standard deviation so no solid
> conclusion can be reached on performance impact. Looking at the figures,
> only the X86-64 ones look suspicious with a few losses that were outside
> the noise. However, the results were so unstable that without knowing why
> they vary so much, a solid conclusion cannot be reached.
>
> SYSBENCH X86
> sysbench-vanilla pgalloc-delay
> 1 7722.85 ( 0.00%) 7756.79 ( 0.44%)
> 2 14901.11 ( 0.00%) 13683.44 (-8.90%)
> 3 15171.71 ( 0.00%) 14888.25 (-1.90%)
> 4 14966.98 ( 0.00%) 15029.67 ( 0.42%)
> 5 14370.47 ( 0.00%) 14865.00 ( 3.33%)
> 6 14870.33 ( 0.00%) 14845.57 (-0.17%)
> 7 14429.45 ( 0.00%) 14520.85 ( 0.63%)
> 8 14354.35 ( 0.00%) 14362.31 ( 0.06%)
>
> SYSBENCH X86-64
> 1 17448.70 ( 0.00%) 17484.41 ( 0.20%)
> 2 34276.39 ( 0.00%) 34251.00 (-0.07%)
> 3 50805.25 ( 0.00%) 50854.80 ( 0.10%)
> 4 66667.10 ( 0.00%) 66174.69 (-0.74%)
> 5 66003.91 ( 0.00%) 65685.25 (-0.49%)
> 6 64981.90 ( 0.00%) 65125.60 ( 0.22%)
> 7 64933.16 ( 0.00%) 64379.23 (-0.86%)
> 8 63353.30 ( 0.00%) 63281.22 (-0.11%)
> 9 63511.84 ( 0.00%) 63570.37 ( 0.09%)
> 10 62708.27 ( 0.00%) 63166.25 ( 0.73%)
> 11 62092.81 ( 0.00%) 61787.75 (-0.49%)
> 12 61330.11 ( 0.00%) 61036.34 (-0.48%)
> 13 61438.37 ( 0.00%) 61994.47 ( 0.90%)
> 14 62304.48 ( 0.00%) 62064.90 (-0.39%)
> 15 63296.48 ( 0.00%) 62875.16 (-0.67%)
> 16 63951.76 ( 0.00%) 63769.09 (-0.29%)
>
> SYSBENCH PPC64
> -sysbench-pgalloc-delay-sysbench
> sysbench-vanilla pgalloc-delay
> 1 7645.08 ( 0.00%) 7467.43 (-2.38%)
> 2 14856.67 ( 0.00%) 14558.73 (-2.05%)
> 3 21952.31 ( 0.00%) 21683.64 (-1.24%)
> 4 27946.09 ( 0.00%) 28623.29 ( 2.37%)
> 5 28045.11 ( 0.00%) 28143.69 ( 0.35%)
> 6 27477.10 ( 0.00%) 27337.45 (-0.51%)
> 7 26489.17 ( 0.00%) 26590.06 ( 0.38%)
> 8 26642.91 ( 0.00%) 25274.33 (-5.41%)
> 9 25137.27 ( 0.00%) 24810.06 (-1.32%)
> 10 24451.99 ( 0.00%) 24275.85 (-0.73%)
> 11 23262.20 ( 0.00%) 23674.88 ( 1.74%)
> 12 24234.81 ( 0.00%) 23640.89 (-2.51%)
> 13 24577.75 ( 0.00%) 24433.50 (-0.59%)
> 14 25640.19 ( 0.00%) 25116.52 (-2.08%)
> 15 26188.84 ( 0.00%) 26181.36 (-0.03%)
> 16 26782.37 ( 0.00%) 26255.99 (-2.00%)
>
> Again, there is little to conclude here. While there are a few losses,
> the results vary by +/- 8% in some cases. They are the results of most
> concern as there are some large losses but it's also within the variance
> typically seen between kernel releases.
>
> The STREAM results varied so little and are so verbose that I didn't
> include them here.
>
> The final test stressed how many huge pages can be allocated. The
> absolute number of huge pages allocated are the same with or without the
> page. However, the "unusability free space index" which is a measure of
> external fragmentation was slightly lower (lower is better) throughout the
> lifetime of the system. I also measured the latency of how long it took
> to successfully allocate a huge page. The latency was slightly lower and
> on X86 and PPC64, more huge pages were allocated almost immediately from
> the free lists. The improvement is slight but there.
>
> [mel@....ul.ie: Tested, reworked for less branches]
> Signed-off-by: Mel Gorman <mel@....ul.ie>
> ---
> mm/page_alloc.c | 27 ++++++++++++++++++++++++---
> 1 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2bc2ac6..fe7017e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -451,6 +451,7 @@ static inline void __free_one_page(struct page *page,
> int migratetype)
> {
> unsigned long page_idx;
> + unsigned long combined_idx;
>
> if (unlikely(PageCompound(page)))
> if (unlikely(destroy_compound_page(page, order)))
> @@ -464,7 +465,6 @@ static inline void __free_one_page(struct page *page,
> VM_BUG_ON(bad_range(zone, page));
>
> while (order < MAX_ORDER-1) {
> - unsigned long combined_idx;
> struct page *buddy;
>
> buddy = __page_find_buddy(page, page_idx, order);
> @@ -481,8 +481,29 @@ static inline void __free_one_page(struct page *page,
> order++;
> }
> set_page_order(page, order);
> - list_add(&page->lru,
> - &zone->free_area[order].free_list[migratetype]);
> +
> + /*
> + * If this is not the largest possible page, check if the buddy
> + * of the next-highest order is free. If it is, it's possible
> + * that pages are being freed that will coalesce soon. In case,
> + * that is happening, add the free page to the tail of the list
> + * so it's less likely to be used soon and more likely to be merged
> + * as a higher order page
> + */
> + if (order < MAX_ORDER-1) {
> + struct page *higher_page, *higher_buddy;
> + combined_idx = __find_combined_index(page_idx, order);
> + higher_page = page + combined_idx - page_idx;
> + higher_buddy = __page_find_buddy(higher_page, combined_idx, order + 1);
> + if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
> + list_add_tail(&page->lru,
> + &zone->free_area[order].free_list[migratetype]);
> + goto out;
> + }
> + }
> +
> + list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
> +out:
> zone->free_area[order].nr_free++;
> }
FWIW,
Reviewed-by: Pekka Enberg <penberg@...helsinki.fi>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists