linux-kernel - Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161128114543.h5e3e7tbmq56eis6@techsingularity.net>
Date:   Mon, 28 Nov 2016 11:45:43 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>,
        Michal Hocko <mhocko@...e.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Linux-MM <linux-mm@...ck.org>,
        Linux-Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3

On Mon, Nov 28, 2016 at 12:00:41PM +0100, Vlastimil Babka wrote:
> On 11/27/2016 02:19 PM, Mel Gorman wrote:
> > 
> > 2-socket modern machine
> >                                 4.9.0-rc5             4.9.0-rc5
> >                                   vanilla             hopcpu-v3
> > Hmean    send-64         178.38 (  0.00%)      256.74 ( 43.93%)
> > Hmean    send-128        351.49 (  0.00%)      507.52 ( 44.39%)
> > Hmean    send-256        671.23 (  0.00%)     1004.19 ( 49.60%)
> > Hmean    send-1024      2663.60 (  0.00%)     3910.42 ( 46.81%)
> > Hmean    send-2048      5126.53 (  0.00%)     7562.13 ( 47.51%)
> > Hmean    send-3312      7949.99 (  0.00%)    11565.98 ( 45.48%)
> > Hmean    send-4096      9433.56 (  0.00%)    12929.67 ( 37.06%)
> > Hmean    send-8192     15940.64 (  0.00%)    21587.63 ( 35.43%)
> > Hmean    send-16384    26699.54 (  0.00%)    32013.79 ( 19.90%)
> > Hmean    recv-64         178.38 (  0.00%)      256.72 ( 43.92%)
> > Hmean    recv-128        351.49 (  0.00%)      507.47 ( 44.38%)
> > Hmean    recv-256        671.20 (  0.00%)     1003.95 ( 49.57%)
> > Hmean    recv-1024      2663.45 (  0.00%)     3909.70 ( 46.79%)
> > Hmean    recv-2048      5126.26 (  0.00%)     7560.67 ( 47.49%)
> > Hmean    recv-3312      7949.50 (  0.00%)    11564.63 ( 45.48%)
> > Hmean    recv-4096      9433.04 (  0.00%)    12927.48 ( 37.04%)
> > Hmean    recv-8192     15939.64 (  0.00%)    21584.59 ( 35.41%)
> > Hmean    recv-16384    26698.44 (  0.00%)    32009.77 ( 19.89%)
> > 
> > 1-socket 6 year old machine
> >                                 4.9.0-rc5             4.9.0-rc5
> >                                   vanilla             hopcpu-v3
> > Hmean    send-64          87.47 (  0.00%)      127.14 ( 45.36%)
> > Hmean    send-128        174.36 (  0.00%)      256.42 ( 47.06%)
> > Hmean    send-256        347.52 (  0.00%)      509.41 ( 46.59%)
> > Hmean    send-1024      1363.03 (  0.00%)     1991.54 ( 46.11%)
> > Hmean    send-2048      2632.68 (  0.00%)     3759.51 ( 42.80%)
> > Hmean    send-3312      4123.19 (  0.00%)     5873.28 ( 42.45%)
> > Hmean    send-4096      5056.48 (  0.00%)     7072.81 ( 39.88%)
> > Hmean    send-8192      8784.22 (  0.00%)    12143.92 ( 38.25%)
> > Hmean    send-16384    15081.60 (  0.00%)    19812.71 ( 31.37%)
> > Hmean    recv-64          86.19 (  0.00%)      126.59 ( 46.87%)
> > Hmean    recv-128        173.93 (  0.00%)      255.21 ( 46.73%)
> > Hmean    recv-256        346.19 (  0.00%)      506.72 ( 46.37%)
> > Hmean    recv-1024      1358.28 (  0.00%)     1980.03 ( 45.77%)
> > Hmean    recv-2048      2623.45 (  0.00%)     3729.35 ( 42.15%)
> > Hmean    recv-3312      4108.63 (  0.00%)     5831.47 ( 41.93%)
> > Hmean    recv-4096      5037.25 (  0.00%)     7021.59 ( 39.39%)
> > Hmean    recv-8192      8762.32 (  0.00%)    12072.44 ( 37.78%)
> > Hmean    recv-16384    15042.36 (  0.00%)    19690.14 ( 30.90%)
> 
> That looks way much better than the "v1" RFC posting. Was it just because
> you stopped doing the "at first iteration, use migratetype as index", and
> initializing pindex UINT_MAX hits so much quicker, or was there something
> more subtle that I missed? There was no changelog between "v1" and "v2".
> 

The array is sized correctly which avoids one useless check. The order-0
lists are always drained first so in some rare cases, only the fast
paths are used. There was a subtle correction in detecting when all of
one list should be drained. In combination, it happened to boost
performance a lot on the two machines I reported on. While 6 other
machines were tested, not all of them saw such a dramatic boost and if
these machines are rebooted and retested every time, the high
performance is not always consistent, it all depends on how often the
fast paths are used.

> > Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@...e.cz>
> 

Thanks.

-- 
Mel Gorman
SUSE Labs