linux-kernel - Results for balancenuma v8, autonuma-v28fast and numacore-20121126

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121130114145.GD20087@suse.de>
Date:	Fri, 30 Nov 2012 11:41:45 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Ingo Molnar <mingo@...nel.org>
Cc:	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Hugh Dickins <hughd@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Paul Turner <pjt@...gle.com>, Hillf Danton <dhillf@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Alex Shi <lkml.alex@...il.com>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	Aneesh Kumar <aneesh.kumar@...ux.vnet.ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Results for balancenuma v8, autonuma-v28fast and numacore-20121126

This is an another insanely long mail. Short summary, based on the results
of what is in tip/master right now, I think if we're going to merge
anything for v3.8 it should be the "Automatic NUMA Balancing V8". It does
reasonably well for many of the workloads and AFAIK there is no reason why
numacore or autonuma could not be rebased on top with the view to merging
proper scheduling and placement policies in 3.9. That way we would have
a comparison between a do-nothing kernel, the most basic of migration
policies and something more complex with some sort of logical progression.

This time I added the NAS Parallel Benchmark running with MPI and OpenMP
to see how they fared. From the series "Automatic NUMA Balancing V8",
the kernels tested were

stats-v6r15	Patches 1-10. TLB optimisations, migration stats. This
		is based on the V6 release but the patches have not
		changed since.
balancenuma-v8r6 Patches 1-46. Full series

The other two kernels were

numacore-20121126 is a pull of tip/master on November 26rd, 2012. It ends
	up being a 3.7-rc6 based kernel

autonuma-v28fast This is a rebased version of Andrea's autonuma-v28fast
	branch with Hugh's THP migration patch on top. Hopefully Andrea
	and Hugh will not mind but I took the liberty of publishing the
	result as the mm-autonuma-v28fastr4-mels-rebase branch in
	git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma.git

I'm treating stats-v6r15 as the baseline as it has the same TLB optimisations
shared between balancenuma and numacore. This may not be fair to autonuma
depending on how it avoids flushing the TLB.

All of these tests were run unattended via MMTests. Any errors in the
methodology would be applied evenly to all kernels tested. There were
monitors running but *not* profiling. The heaviest monitor would read
numa_maps every 10 seconds and is only read one per address space and
reused by all threads. This will affect peaks because it means the monitors
contend on some of the same locks the PTE scanner does for example.

AUTONUMA BENCH
                                      3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7
                                    stats-v6r15     numacore-20121126    autonuma-v28fastr4       balancenuma-v8r6
User    NUMA01               66979.15 (  0.00%)    24590.05 ( 63.29%)    30815.06 ( 53.99%)    56701.65 ( 15.34%)
User    NUMA01_THEADLOCAL    61248.25 (  0.00%)    18607.40 ( 69.62%)    17124.49 ( 72.04%)    17344.99 ( 71.68%)
User    NUMA02                6645.34 (  0.00%)     2116.64 ( 68.15%)     2209.76 ( 66.75%)     2073.78 ( 68.79%)
User    NUMA02_SMT            2925.65 (  0.00%)      989.22 ( 66.19%)     1020.53 ( 65.12%)     1000.81 ( 65.79%)
System  NUMA01                  45.46 (  0.00%)     1038.13 (-2183.61%)      195.90 (-330.93%)      289.11 (-535.97%)
System  NUMA01_THEADLOCAL       46.15 (  0.00%)      556.78 (-1106.46%)       72.36 (-56.79%)      112.87 (-144.57%)
System  NUMA02                   1.66 (  0.00%)       25.38 (-1428.92%)        7.49 (-351.20%)        9.71 (-484.94%)
System  NUMA02_SMT               0.92 (  0.00%)       10.70 (-1063.04%)        2.41 (-161.96%)        3.40 (-269.57%)
Elapsed NUMA01                1513.72 (  0.00%)      571.78 ( 62.23%)      795.56 ( 47.44%)     1292.04 ( 14.64%)
Elapsed NUMA01_THEADLOCAL     1390.72 (  0.00%)      420.02 ( 69.80%)      380.84 ( 72.62%)      379.59 ( 72.71%)
Elapsed NUMA02                 167.65 (  0.00%)       50.52 ( 69.87%)       53.22 ( 68.26%)       49.17 ( 70.67%)
Elapsed NUMA02_SMT             164.38 (  0.00%)       48.26 ( 70.64%)       48.10 ( 70.74%)       46.91 ( 71.46%)
CPU     NUMA01                4427.00 (  0.00%)     4482.00 ( -1.24%)     3897.00 ( 11.97%)     4410.00 (  0.38%)
CPU     NUMA01_THEADLOCAL     4407.00 (  0.00%)     4562.00 ( -3.52%)     4515.00 ( -2.45%)     4599.00 ( -4.36%)
CPU     NUMA02                3964.00 (  0.00%)     4239.00 ( -6.94%)     4165.00 ( -5.07%)     4236.00 ( -6.86%)
CPU     NUMA02_SMT            1780.00 (  0.00%)     2071.00 (-16.35%)     2126.00 (-19.44%)     2140.00 (-20.22%)

numacore is the best at running the adverse numa01 workload. autonuma does
respectably but balancenuma does not cope with this case. It improves on the
baseline but it does not know how to interleave for this type of workload.

For the other workloads that are friendlier to NUMA, the three trees do
not differ by massive amounts.  There are not multiple runs because it
takes too long but there is a possibility the results are within the noise.

Where we differ is in system CPU usage. In all cases, numacore uses more
system CPU. It is likely it is compensating better for this overhead
with better placement. With this higher overhead it ends up with a tie
on everything except the adverse workload. Take NUMA01_THREADLOCAL as an
example -- numacore uses roughly 3-4 times more system CPU than autonuma
or balancenuma. autonumas cost could be hidden in kernel threads but that's
not true for balancenuma.

MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
         stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
User       137805.34    46310.68    51177.02    77128.10
System         94.81     1631.75      278.81      415.74
Elapsed      3245.05     1101.08     1287.83     1776.42

The overall elapsed time is differences in how well numa01 is handled. There
are large differences in the system CPU in the different trees. numacore
is using over twice the amount of CPU as either autonuma or balancenuma.


MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
                           stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
Page Ins                         42892       42804       42988       42616
Page Outs                        31156       12352       13980       19192
Swap Ins                             0           0           0           0
Swap Outs                            0           0           0           0
Direct pages scanned                 0           0           0           0
Kswapd pages scanned                 0           0           0           0
Kswapd pages reclaimed               0           0           0           0
Direct pages reclaimed               0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%
Page writes by reclaim               0           0           0           0
Page writes file                     0           0           0           0
Page writes anon                     0           0           0           0
Page reclaim immediate               0           0           0           0
Page rescued immediate               0           0           0           0
Slabs scanned                        0           0           0           0
Direct inode steals                  0           0           0           0
Kswapd inode steals                  0           0           0           0
Kswapd skipped wait                  0           0           0           0
THP fault alloc                  16022       13747       19639       17857
THP collapse alloc                   9           4          51           3
THP splits                           2           1           7           6
THP fault fallback                   0           0           0           0
THP collapse fail                    0           0           0           0
Compaction stalls                    0           0           0           0
Compaction success                   0           0           0           0
Compaction failures                  0           0           0           0
Page migrate success                 0           0           0    10303098
Page migrate failure                 0           0           0           0
Compaction pages isolated            0           0           0           0
Compaction migrate scanned           0           0           0           0
Compaction free scanned              0           0           0           0
Compaction cost                      0           0           0       10694
NUMA PTE updates                     0           0           0   147254249
NUMA hint faults                     0           0           0      688568
NUMA hint local faults               0           0           0      542906
NUMA pages migrated                  0           0           0    10303098
AutoNUMA cost                        0           0           0        4669

Not much to usefully interpret here other than noting we generally avoid
splitting THP. For balancenuma, note what the scan adaption does to the
number of PTE updates and the number of faults incurred. A policy may
not necessarily like this. It depends on its requirements but if it wants
higher PTE scan rates it will have to compensate for it.

Next is the specjbb. There are 4 separate configurations

multiple JVMs, THP
multiple JVMs, no THP
single JVM, THP
single JVM, no THP

SPECJBB: Multiple JVMs (one per node, 4 nodes), THP is enabled
                      3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7
                    stats-v6r15     numacore-20121126    autonuma-v28fastr4       balancenuma-v8r6
Mean   1      31600.00 (  0.00%)     27467.75 (-13.08%)     31006.75 ( -1.88%)     31360.25 ( -0.76%)
Mean   2      62937.75 (  0.00%)     55240.00 (-12.23%)     65086.25 (  3.41%)     61924.00 ( -1.61%)
Mean   3      91147.25 (  0.00%)     81735.50 (-10.33%)     95839.00 (  5.15%)     90739.00 ( -0.45%)
Mean   4     114616.50 (  0.00%)     94354.75 (-17.68%)    124129.50 (  8.30%)    116105.25 (  1.30%)
Mean   5     136264.25 (  0.00%)    107829.25 (-20.87%)    150632.00 ( 10.54%)    139659.25 (  2.49%)
Mean   6     152161.75 (  0.00%)    123039.75 (-19.14%)    175110.25 ( 15.08%)    157911.25 (  3.78%)
Mean   7     150385.25 (  0.00%)    137133.00 ( -8.81%)    180693.25 ( 20.15%)    160335.50 (  6.62%)
Mean   8     146897.75 (  0.00%)     94324.75 (-35.79%)    184689.00 ( 25.73%)    159786.50 (  8.77%)
Mean   9     141853.25 (  0.00%)    103640.75 (-26.94%)    183592.75 ( 29.42%)    153544.25 (  8.24%)
Mean   10    145524.00 (  0.00%)    113260.25 (-22.17%)    179482.75 ( 23.34%)    145893.50 (  0.25%)
Mean   11    129652.25 (  0.00%)     98646.75 (-23.91%)    174891.50 ( 34.89%)    138897.75 (  7.13%)
Mean   12    123313.25 (  0.00%)    124340.75 (  0.83%)    168959.25 ( 37.02%)    138027.00 ( 11.93%)
Mean   13    122442.75 (  0.00%)    107168.25 (-12.47%)    164761.50 ( 34.56%)    135222.50 ( 10.44%)
Mean   14    120407.50 (  0.00%)    107057.00 (-11.09%)    163350.50 ( 35.66%)    132712.25 ( 10.22%)
Mean   15    118236.50 (  0.00%)    106874.00 ( -9.61%)    160638.75 ( 35.86%)    129598.75 (  9.61%)
Mean   16    115439.00 (  0.00%)    128464.75 ( 11.28%)    158838.00 ( 37.59%)    122542.50 (  6.15%)
Mean   17    111400.25 (  0.00%)    127869.50 ( 14.78%)    157191.25 ( 41.10%)    129454.50 ( 16.21%)
Mean   18    114168.50 (  0.00%)    121763.00 (  6.65%)    154828.75 ( 35.61%)    125674.25 ( 10.08%)
Mean   19    112622.25 (  0.00%)    114235.50 (  1.43%)    154380.25 ( 37.08%)    122692.00 (  8.94%)
Mean   20    109717.75 (  0.00%)    109561.50 ( -0.14%)    153291.75 ( 39.71%)    122799.25 ( 11.92%)
Mean   21    106640.00 (  0.00%)    103904.75 ( -2.56%)    151053.75 ( 41.65%)    118169.50 ( 10.81%)
Mean   22    105173.00 (  0.00%)    107866.00 (  2.56%)    149248.75 ( 41.91%)    120062.00 ( 14.16%)
Mean   23    104009.50 (  0.00%)     84539.25 (-18.72%)    147848.25 ( 42.15%)    119518.25 ( 14.91%)
Mean   24    102713.75 (  0.00%)     85635.25 (-16.63%)    145843.25 ( 41.99%)    120339.75 ( 17.16%)
Stddev 1       1366.60 (  0.00%)      1135.04 ( 16.94%)      1619.94 (-18.54%)      1370.51 ( -0.29%)
Stddev 2        918.86 (  0.00%)      3552.45 (-286.61%)      1024.58 (-11.51%)       813.06 ( 11.51%)
Stddev 3       1066.85 (  0.00%)       881.39 ( 17.38%)      1176.32 (-10.26%)      1356.60 (-27.16%)
Stddev 4       1493.03 (  0.00%)      5298.20 (-254.86%)      1587.00 ( -6.29%)      1271.82 ( 14.82%)
Stddev 5        877.10 (  0.00%)      7526.59 (-758.13%)      1298.12 (-48.00%)      1030.81 (-17.53%)
Stddev 6       2351.71 (  0.00%)     16420.61 (-598.24%)      1122.37 ( 52.27%)      1276.07 ( 45.74%)
Stddev 7       1259.53 (  0.00%)     11596.65 (-820.71%)      1777.67 (-41.14%)      3225.46 (-156.08%)
Stddev 8       2912.35 (  0.00%)     18376.73 (-530.99%)      2428.53 ( 16.61%)      2997.79 ( -2.93%)
Stddev 9       6512.12 (  0.00%)      3668.11 ( 43.67%)      3311.86 ( 49.14%)      5116.28 ( 21.43%)
Stddev 10      6096.83 (  0.00%)      6969.09 (-14.31%)      6918.63 (-13.48%)      4623.63 ( 24.16%)
Stddev 11      9487.80 (  0.00%)      8337.58 ( 12.12%)     10122.20 ( -6.69%)      4651.18 ( 50.98%)
Stddev 12      8235.94 (  0.00%)     12325.53 (-49.66%)     13754.33 (-67.00%)      3002.66 ( 63.54%)
Stddev 13      8345.11 (  0.00%)     12512.09 (-49.93%)     15335.24 (-83.76%)      2206.88 ( 73.55%)
Stddev 14      8752.13 (  0.00%)      1689.34 ( 80.70%)     15529.14 (-77.43%)      6095.85 ( 30.35%)
Stddev 15      7611.56 (  0.00%)      3735.24 ( 50.93%)     16501.90 (-116.80%)      4713.94 ( 38.07%)
Stddev 16      8223.93 (  0.00%)      3621.59 ( 55.96%)     16426.27 (-99.74%)      5322.68 ( 35.28%)
Stddev 17      8829.49 (  0.00%)       100.89 ( 98.86%)     16633.79 (-88.39%)      3884.20 ( 56.01%)
Stddev 18      7053.69 (  0.00%)      1390.26 ( 80.29%)     18474.77 (-161.92%)      4296.24 ( 39.09%)
Stddev 19      6775.02 (  0.00%)      1335.05 ( 80.29%)     18046.60 (-166.37%)      3698.15 ( 45.41%)
Stddev 20      7481.59 (  0.00%)      4460.51 ( 40.38%)     17890.82 (-139.13%)      3406.39 ( 54.47%)
Stddev 21      8100.05 (  0.00%)      2934.02 ( 63.78%)     19041.29 (-135.08%)      2966.54 ( 63.38%)
Stddev 22      6507.61 (  0.00%)      3128.61 ( 51.92%)     17399.30 (-167.37%)      4242.58 ( 34.81%)
Stddev 23      6113.03 (  0.00%)      4226.82 ( 30.86%)     18573.42 (-203.83%)      5575.06 (  8.80%)
Stddev 24      5128.26 (  0.00%)      1695.29 ( 66.94%)     18824.94 (-267.08%)      4011.27 ( 21.78%)
TPut   1     126400.00 (  0.00%)    109871.00 (-13.08%)    124027.00 ( -1.88%)    125441.00 ( -0.76%)
TPut   2     251751.00 (  0.00%)    220960.00 (-12.23%)    260345.00 (  3.41%)    247696.00 ( -1.61%)
TPut   3     364589.00 (  0.00%)    326942.00 (-10.33%)    383356.00 (  5.15%)    362956.00 ( -0.45%)
TPut   4     458466.00 (  0.00%)    377419.00 (-17.68%)    496518.00 (  8.30%)    464421.00 (  1.30%)
TPut   5     545057.00 (  0.00%)    431317.00 (-20.87%)    602528.00 ( 10.54%)    558637.00 (  2.49%)
TPut   6     608647.00 (  0.00%)    492159.00 (-19.14%)    700441.00 ( 15.08%)    631645.00 (  3.78%)
TPut   7     601541.00 (  0.00%)    548532.00 ( -8.81%)    722773.00 ( 20.15%)    641342.00 (  6.62%)
TPut   8     587591.00 (  0.00%)    377299.00 (-35.79%)    738756.00 ( 25.73%)    639146.00 (  8.77%)
TPut   9     567413.00 (  0.00%)    414563.00 (-26.94%)    734371.00 ( 29.42%)    614177.00 (  8.24%)
TPut   10    582096.00 (  0.00%)    453041.00 (-22.17%)    717931.00 ( 23.34%)    583574.00 (  0.25%)
TPut   11    518609.00 (  0.00%)    394587.00 (-23.91%)    699566.00 ( 34.89%)    555591.00 (  7.13%)
TPut   12    493253.00 (  0.00%)    497363.00 (  0.83%)    675837.00 ( 37.02%)    552108.00 ( 11.93%)
TPut   13    489771.00 (  0.00%)    428673.00 (-12.47%)    659046.00 ( 34.56%)    540890.00 ( 10.44%)
TPut   14    481630.00 (  0.00%)    428228.00 (-11.09%)    653402.00 ( 35.66%)    530849.00 ( 10.22%)
TPut   15    472946.00 (  0.00%)    427496.00 ( -9.61%)    642555.00 ( 35.86%)    518395.00 (  9.61%)
TPut   16    461756.00 (  0.00%)    513859.00 ( 11.28%)    635352.00 ( 37.59%)    490170.00 (  6.15%)
TPut   17    445601.00 (  0.00%)    511478.00 ( 14.78%)    628765.00 ( 41.10%)    517818.00 ( 16.21%)
TPut   18    456674.00 (  0.00%)    487052.00 (  6.65%)    619315.00 ( 35.61%)    502697.00 ( 10.08%)
TPut   19    450489.00 (  0.00%)    456942.00 (  1.43%)    617521.00 ( 37.08%)    490768.00 (  8.94%)
TPut   20    438871.00 (  0.00%)    438246.00 ( -0.14%)    613167.00 ( 39.71%)    491197.00 ( 11.92%)
TPut   21    426560.00 (  0.00%)    415619.00 ( -2.56%)    604215.00 ( 41.65%)    472678.00 ( 10.81%)
TPut   22    420692.00 (  0.00%)    431464.00 (  2.56%)    596995.00 ( 41.91%)    480248.00 ( 14.16%)
TPut   23    416038.00 (  0.00%)    338157.00 (-18.72%)    591393.00 ( 42.15%)    478073.00 ( 14.91%)
TPut   24    410855.00 (  0.00%)    342541.00 (-16.63%)    583373.00 ( 41.99%)    481359.00 ( 17.16%)

numacore is not handling the multiple JVM case well with numerous regressions
for lower number of threads. It is a bit better around the expected peak
of 12 warehouses per JVM for this configuration. There are also large
variances between the different JVMs throughput but note again that this
improves as the number of warehouses increase.

autonuma generally does very well in terms of throughput but the variance
between JVMs is massive.

balancenuma does reasonably well and improves upon the baseline kernel. It
shows regressions for small warehouses which was not evident in V6 and so it
is known to vary a bit. However, as the number of warehouses increases, it
shows some performance improvement and the variances are not too bad. It's
far short of what autonuma achieved but it's respectable.

SPECJBB PEAKS
                                   3.7.0-rc7                  3.7.0-rc6                  3.7.0-rc7                  3.7.0-rc7
                                 stats-v6r15          numacore-20121126         autonuma-v28fastr4            balancenuma-v8r6
 Expctd Warehouse            12.00 (  0.00%)            12.00 (  0.00%)            12.00 (  0.00%)            12.00 (  0.00%)
 Expctd Peak Bops        493253.00 (  0.00%)        497363.00 (  0.83%)        675837.00 ( 37.02%)        552108.00 ( 11.93%)
 Actual Warehouse             6.00 (  0.00%)             7.00 ( 16.67%)             8.00 ( 33.33%)             7.00 ( 16.67%)
 Actual Peak Bops        608647.00 (  0.00%)        548532.00 ( -9.88%)        738756.00 ( 21.38%)        641342.00 (  5.37%)
 SpecJBB Bops            451164.00 (  0.00%)        439778.00 ( -2.52%)        624688.00 ( 38.46%)        503634.00 ( 11.63%)
 SpecJBB Bops/JVM        112791.00 (  0.00%)        109945.00 ( -2.52%)        156172.00 ( 38.46%)        125909.00 ( 11.63%)

Note the peak numbers for numacore. The peak performance regresses 9.88%
from the baseline kernel. In a previous 3.7-rc6 comparison it showed an
improvement in the specjbb score of 0.52% at the peak. This is not a fair
comparison any more because of the large differences in kernels but it's
still the case that the specjbb score looks better than the actual peak
throughput because of how the specjbb score is calculated.

autonuma sees an 21.38% performance gain at its peak and a 38.46% gain in
its specjbb score.

balancenuma does reasonably well with a 5.37% gain at its peak and 11.63%
on its overall specjbb score. Not as good as autonuma, but respectable.

MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
         stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
User       177410.90   171382.97   177112.15   177078.17
System        175.57     5976.48      219.87      514.57
Elapsed      4035.05     4037.94     4037.14     4030.78

Note the system CPU usage. numacore is using 11 times more system CPU
than balancenuma is and 27 times more than autonuma (usual disclaimer
about threads).


MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
                           stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
Page Ins                         38092       37968       37632       66512
Page Outs                        50240       52836       48468       64196
Swap Ins                             0           0           0           0
Swap Outs                            0           0           0           0
Direct pages scanned                 0           0           0           0
Kswapd pages scanned                 0           0           0           0
Kswapd pages reclaimed               0           0           0           0
Direct pages reclaimed               0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%
Page writes by reclaim               0           0           0           0
Page writes file                     0           0           0           0
Page writes anon                     0           0           0           0
Page reclaim immediate               0           0           0           0
Page rescued immediate               0           0           0           0
Slabs scanned                        0           0           0           0
Direct inode steals                  0           0           0           0
Kswapd inode steals                  0           0           0           0
Kswapd skipped wait                  0           0           0           0
THP fault alloc                  65717       49223       56929       67137
THP collapse alloc                 125          55         462         122
THP splits                         370         211         383         367
THP fault fallback                   0           0           0           0
THP collapse fail                    0           0           0           0
Compaction stalls                    0           0           0           0
Compaction success                   0           0           0           0
Compaction failures                  0           0           0           0
Page migrate success                 0           0           0    51459156
Page migrate failure                 0           0           0           0
Compaction pages isolated            0           0           0           0
Compaction migrate scanned           0           0           0           0
Compaction free scanned              0           0           0           0
Compaction cost                      0           0           0       53414
NUMA PTE updates                     0           0           0   415931339
NUMA hint faults                     0           0           0     3089027
NUMA hint local faults               0           0           0      936873
NUMA pages migrated                  0           0           0    51459156
AutoNUMA cost                        0           0           0       19334

The main takeaways here is that there were THP allocations and all the
trees split THPs at very roughly the same rate overall. Migration stats
are not available for numacore or autonuma but the migration stats for
balancenuma show that it's migrating at a rate 49MB/sec on average. This
is far higher than I'd like and a proper policy on top should be able to
help get that down.

SPECJBB: Multiple JVMs (one per node, 4 nodes), THP is disabled

                      3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7
                    stats-v6r15     numacore-20121126    autonuma-v28fastr4       balancenuma-v8r6
Mean   1      25460.75 (  0.00%)     19041.25 (-25.21%)     25538.50 (  0.31%)     25889.25 (  1.68%)
Mean   2      53520.75 (  0.00%)     36285.25 (-32.20%)     56045.00 (  4.72%)     52424.00 ( -2.05%)
Mean   3      77555.00 (  0.00%)     53221.25 (-31.38%)     83147.25 (  7.21%)     76898.75 ( -0.85%)
Mean   4     100030.00 (  0.00%)     65234.00 (-34.79%)    108965.25 (  8.93%)     98110.75 ( -1.92%)
Mean   5     120309.25 (  0.00%)     76315.25 (-36.57%)    132176.00 (  9.86%)    119555.75 ( -0.63%)
Mean   6     136112.50 (  0.00%)     89173.00 (-34.49%)    150532.75 ( 10.59%)    136993.00 (  0.65%)
Mean   7     135358.75 (  0.00%)     93026.00 (-31.27%)    159185.00 ( 17.60%)    138854.25 (  2.58%)
Mean   8     134319.50 (  0.00%)     97704.50 (-27.26%)    162122.25 ( 20.70%)    138954.25 (  3.45%)
Mean   9     132189.75 (  0.00%)     97305.75 (-26.39%)    161477.25 ( 22.16%)    135756.75 (  2.70%)
Mean   10    128023.25 (  0.00%)     86914.50 (-32.11%)    159014.25 ( 24.21%)    130314.75 (  1.79%)
Mean   11    119226.75 (  0.00%)     95627.25 (-19.79%)    155241.50 ( 30.21%)    123851.00 (  3.88%)
Mean   12    111769.50 (  0.00%)     88829.00 (-20.52%)    150002.75 ( 34.21%)    115657.25 (  3.48%)
Mean   13    110908.25 (  0.00%)    105153.00 ( -5.19%)    146769.75 ( 32.33%)    113916.00 (  2.71%)
Mean   14    109063.25 (  0.00%)    103905.50 ( -4.73%)    144350.50 ( 32.35%)    116530.75 (  6.85%)
Mean   15    105400.50 (  0.00%)    102274.25 ( -2.97%)    141991.50 ( 34.72%)    116928.50 ( 10.94%)
Mean   16    106195.50 (  0.00%)    100147.00 ( -5.70%)    141436.25 ( 33.18%)    114429.25 (  7.75%)
Mean   17    102077.00 (  0.00%)     98444.50 ( -3.56%)    139735.25 ( 36.89%)    113637.00 ( 11.32%)
Mean   18    101157.00 (  0.00%)     96963.25 ( -4.15%)    137867.50 ( 36.29%)    113728.75 ( 12.43%)
Mean   19     99892.75 (  0.00%)     95881.00 ( -4.02%)    135465.25 ( 35.61%)    112367.50 ( 12.49%)
Mean   20    100012.50 (  0.00%)     93851.50 ( -6.16%)    134840.25 ( 34.82%)    112712.25 ( 12.70%)
Mean   21     97157.25 (  0.00%)     92788.25 ( -4.50%)    133454.25 ( 37.36%)    107491.50 ( 10.64%)
Mean   22     97807.25 (  0.00%)     90831.25 ( -7.13%)    130811.00 ( 33.74%)    108284.00 ( 10.71%)
Mean   23     94287.00 (  0.00%)     88404.50 ( -6.24%)    129693.00 ( 37.55%)    106024.25 ( 12.45%)
Mean   24     94142.00 (  0.00%)     86549.00 ( -8.07%)    127417.25 ( 35.35%)    103483.00 (  9.92%)
Stddev 1        873.15 (  0.00%)       819.01 (  6.20%)       805.93 (  7.70%)       982.04 (-12.47%)
Stddev 2        828.04 (  0.00%)       151.51 ( 81.70%)       641.04 ( 22.58%)       504.12 ( 39.12%)
Stddev 3        824.92 (  0.00%)      3708.80 (-349.60%)      1092.76 (-32.47%)      2024.69 (-145.44%)
Stddev 4        607.86 (  0.00%)      1768.43 (-190.93%)      1422.30 (-133.99%)      1298.14 (-113.56%)
Stddev 5        836.75 (  0.00%)      1048.83 (-25.34%)      1656.67 (-97.99%)      2600.99 (-210.84%)
Stddev 6        641.16 (  0.00%)      1010.82 (-57.66%)       990.71 (-54.52%)      1832.47 (-185.81%)
Stddev 7       4556.68 (  0.00%)      2374.23 ( 47.90%)      1395.66 ( 69.37%)      3149.28 ( 30.89%)
Stddev 8       3770.88 (  0.00%)      5926.66 (-57.17%)      1017.86 ( 73.01%)      3213.00 ( 14.79%)
Stddev 9       2396.64 (  0.00%)      2946.42 (-22.94%)      1131.78 ( 52.78%)      5125.85 (-113.88%)
Stddev 10      2535.66 (  0.00%)      2827.47 (-11.51%)      2330.35 (  8.10%)      2662.72 ( -5.01%)
Stddev 11      2858.16 (  0.00%)      4522.90 (-58.25%)      5970.58 (-108.90%)      3843.01 (-34.46%)
Stddev 12      4084.30 (  0.00%)      2782.83 ( 31.87%)      9008.52 (-120.56%)      1062.12 ( 74.00%)
Stddev 13      3079.56 (  0.00%)      1107.30 ( 64.04%)      9118.81 (-196.11%)      3075.82 (  0.12%)
Stddev 14      2886.35 (  0.00%)      1497.39 ( 48.12%)      9084.67 (-214.75%)      3209.97 (-11.21%)
Stddev 15      3302.30 (  0.00%)      1942.68 ( 41.17%)     10684.80 (-223.56%)      1094.48 ( 66.86%)
Stddev 16      3868.79 (  0.00%)      2024.71 ( 47.67%)     10202.01 (-163.70%)      1389.86 ( 64.08%)
Stddev 17      3318.20 (  0.00%)      1031.66 ( 68.91%)     10295.90 (-210.29%)      1334.94 ( 59.77%)
Stddev 18      3926.91 (  0.00%)       976.39 ( 75.14%)     11497.98 (-192.80%)       914.90 ( 76.70%)
Stddev 19      3169.02 (  0.00%)       668.74 ( 78.90%)     10951.67 (-245.59%)      2192.84 ( 30.80%)
Stddev 20      3343.84 (  0.00%)       727.51 ( 78.24%)     10974.75 (-228.21%)       991.99 ( 70.33%)
Stddev 21      3253.04 (  0.00%)      1212.03 ( 62.74%)     11682.29 (-259.12%)       802.70 ( 75.32%)
Stddev 22      3320.18 (  0.00%)      1017.95 ( 69.34%)     11224.85 (-238.08%)       536.20 ( 83.85%)
Stddev 23      3160.77 (  0.00%)      1544.09 ( 51.15%)     11611.88 (-267.37%)      1076.64 ( 65.94%)
Stddev 24      3079.01 (  0.00%)       739.34 ( 75.99%)     13124.55 (-326.26%)      1311.96 ( 57.39%)
TPut   1     101843.00 (  0.00%)     76165.00 (-25.21%)    102154.00 (  0.31%)    103557.00 (  1.68%)
TPut   2     214083.00 (  0.00%)    145141.00 (-32.20%)    224180.00 (  4.72%)    209696.00 ( -2.05%)
TPut   3     310220.00 (  0.00%)    212885.00 (-31.38%)    332589.00 (  7.21%)    307595.00 ( -0.85%)
TPut   4     400120.00 (  0.00%)    260936.00 (-34.79%)    435861.00 (  8.93%)    392443.00 ( -1.92%)
TPut   5     481237.00 (  0.00%)    305261.00 (-36.57%)    528704.00 (  9.86%)    478223.00 ( -0.63%)
TPut   6     544450.00 (  0.00%)    356692.00 (-34.49%)    602131.00 ( 10.59%)    547972.00 (  0.65%)
TPut   7     541435.00 (  0.00%)    372104.00 (-31.27%)    636740.00 ( 17.60%)    555417.00 (  2.58%)
TPut   8     537278.00 (  0.00%)    390818.00 (-27.26%)    648489.00 ( 20.70%)    555817.00 (  3.45%)
TPut   9     528759.00 (  0.00%)    389223.00 (-26.39%)    645909.00 ( 22.16%)    543027.00 (  2.70%)
TPut   10    512093.00 (  0.00%)    347658.00 (-32.11%)    636057.00 ( 24.21%)    521259.00 (  1.79%)
TPut   11    476907.00 (  0.00%)    382509.00 (-19.79%)    620966.00 ( 30.21%)    495404.00 (  3.88%)
TPut   12    447078.00 (  0.00%)    355316.00 (-20.52%)    600011.00 ( 34.21%)    462629.00 (  3.48%)
TPut   13    443633.00 (  0.00%)    420612.00 ( -5.19%)    587079.00 ( 32.33%)    455664.00 (  2.71%)
TPut   14    436253.00 (  0.00%)    415622.00 ( -4.73%)    577402.00 ( 32.35%)    466123.00 (  6.85%)
TPut   15    421602.00 (  0.00%)    409097.00 ( -2.97%)    567966.00 ( 34.72%)    467714.00 ( 10.94%)
TPut   16    424782.00 (  0.00%)    400588.00 ( -5.70%)    565745.00 ( 33.18%)    457717.00 (  7.75%)
TPut   17    408308.00 (  0.00%)    393778.00 ( -3.56%)    558941.00 ( 36.89%)    454548.00 ( 11.32%)
TPut   18    404628.00 (  0.00%)    387853.00 ( -4.15%)    551470.00 ( 36.29%)    454915.00 ( 12.43%)
TPut   19    399571.00 (  0.00%)    383524.00 ( -4.02%)    541861.00 ( 35.61%)    449470.00 ( 12.49%)
TPut   20    400050.00 (  0.00%)    375406.00 ( -6.16%)    539361.00 ( 34.82%)    450849.00 ( 12.70%)
TPut   21    388629.00 (  0.00%)    371153.00 ( -4.50%)    533817.00 ( 37.36%)    429966.00 ( 10.64%)
TPut   22    391229.00 (  0.00%)    363325.00 ( -7.13%)    523244.00 ( 33.74%)    433136.00 ( 10.71%)
TPut   23    377148.00 (  0.00%)    353618.00 ( -6.24%)    518772.00 ( 37.55%)    424097.00 ( 12.45%)
TPut   24    376568.00 (  0.00%)    346196.00 ( -8.07%)    509669.00 ( 35.35%)    413932.00 (  9.92%)

numacore regresses without THP on multiple JVM configurations, particularly
for lower number of warehouses. Note that once again it improves as the
number of warehouses increase. SpecJBB reports based on peaks so this will
be missed if only the peak figures are quoted in other benchmark reports.

autonuma again performs very well although it's variances between JVMs
is nuts.

Without THP, balancenuma shows small regressions for small numbers of
warehouses but recovers to show decent performance gains. Note that the
gains vary between warehouses because it's completely at the mercy of the
default scheduler decisions which are getting no hints about NUMA placement.

SPECJBB PEAKS
                                   3.7.0-rc7                  3.7.0-rc6                  3.7.0-rc7                  3.7.0-rc7
                                 stats-v6r15          numacore-20121126         autonuma-v28fastr4            balancenuma-v8r6
 Expctd Warehouse            12.00 (  0.00%)            12.00 (  0.00%)            12.00 (  0.00%)            12.00 (  0.00%)
 Expctd Peak Bops        447078.00 (  0.00%)        355316.00 (-20.52%)        600011.00 ( 34.21%)        462629.00 (  3.48%)
 Actual Warehouse             6.00 (  0.00%)            13.00 (116.67%)             8.00 ( 33.33%)             8.00 ( 33.33%)
 Actual Peak Bops        544450.00 (  0.00%)        420612.00 (-22.75%)        648489.00 ( 19.11%)        555817.00 (  2.09%)
 SpecJBB Bops            409191.00 (  0.00%)        382775.00 ( -6.46%)        551949.00 ( 34.89%)        447750.00 (  9.42%)
 SpecJBB Bops/JVM        102298.00 (  0.00%)         95694.00 ( -6.46%)        137987.00 ( 34.89%)        111938.00 (  9.42%)

numacore regresses from the peak by 22.75% and the specjbb overall score is down 6.46%.

autonuma does well with a 19.11% gain on the peak and 34.89% overall.

balancenuma does reasonably well -- 2.09% gain at the peak and 9.42%
gain overall.

MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
         stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
User       177276.00   146602.11   176834.75   175649.50
System         91.09    27863.11      283.25     1455.39
Elapsed      4030.76     4042.32     4038.79     4038.06

numacores system CPU usage is extremely high.

autonumas is ok (kernel threads blah blah)

balancenumas is higher than I'd like. I want to describe is as "not crazy"
but it probably is to everybody else.

MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
                           stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
Page Ins                         37836       37744       38072       37192
Page Outs                        49440       51944       49024       51384
Swap Ins                             0           0           0           0
Swap Outs                            0           0           0           0
Direct pages scanned                 0           0           0           0
Kswapd pages scanned                 0           0           0           0
Kswapd pages reclaimed               0           0           0           0
Direct pages reclaimed               0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%
Page writes by reclaim               0           0           0           0
Page writes file                     0           0           0           0
Page writes anon                     0           0           0           0
Page reclaim immediate               0           0           0           0
Page rescued immediate               0           0           0           0
Slabs scanned                        0           0           0           0
Direct inode steals                  0           0           0           0
Kswapd inode steals                  0           0           0           0
Kswapd skipped wait                  0           0           0           0
THP fault alloc                      2           1           1           3
THP collapse alloc                   2           0          20           0
THP splits                           0           0           0           0
THP fault fallback                   0           0           0           0
THP collapse fail                    0           0           0           0
Compaction stalls                    0           0           0           0
Compaction success                   0           0           0           0
Compaction failures                  0           0           0           0
Page migrate success                 0           0           0    37212252
Page migrate failure                 0           0           0           0
Compaction pages isolated            0           0           0           0
Compaction migrate scanned           0           0           0           0
Compaction free scanned              0           0           0           0
Compaction cost                      0           0           0       38626
NUMA PTE updates                     0           0           0   290219318
NUMA hint faults                     0           0           0   267929465
NUMA hint local faults               0           0           0    69757534
NUMA pages migrated                  0           0           0    37212252
AutoNUMA cost                        0           0           0     1342385

First take-away is the lack of THP activity.

Here the stats balancenuma reports are useful because we're only dealing
with base pages. balancenuma migrates 36MB/second which is really high,
particularly when you bear in mind that with copying that's 72MB/sec of
data transferred. From earlier test results we know the scan rate adaption
helps keep this figure down and that average migration rates is something
we should keep an eye on.

>From here, we're onto the single JVM configuration. I suspect
this is tested much more commonly but note that it behaves very
differently to the multi JVM configuration as explained by Andrea
(http://choon.net/forum/read.php?21,1599976,page=4).

SPECJBB: Single JVM, THP is enabled
                    3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7
                  stats-v6r15     numacore-20121126    autonuma-v28fastr4       balancenuma-v8r6
TPut 1      25219.00 (  0.00%)     24994.00 ( -0.89%)     23003.00 ( -8.79%)     26876.00 (  6.57%)
TPut 2      56218.00 (  0.00%)     52603.00 ( -6.43%)     52412.00 ( -6.77%)     55372.00 ( -1.50%)
TPut 3      87560.00 (  0.00%)     78545.00 (-10.30%)     82769.00 ( -5.47%)     87351.00 ( -0.24%)
TPut 4     114877.00 (  0.00%)    110117.00 ( -4.14%)    109057.00 ( -5.07%)    116584.00 (  1.49%)
TPut 5     145249.00 (  0.00%)    126704.00 (-12.77%)    136402.00 ( -6.09%)    144194.00 ( -0.73%)
TPut 6     169591.00 (  0.00%)    147129.00 (-13.24%)    153711.00 ( -9.36%)    170627.00 (  0.61%)
TPut 7     194429.00 (  0.00%)    171652.00 (-11.71%)    185094.00 ( -4.80%)    197385.00 (  1.52%)
TPut 8     218492.00 (  0.00%)    167754.00 (-23.22%)    212731.00 ( -2.64%)    225145.00 (  3.04%)
TPut 9     242090.00 (  0.00%)    200709.00 (-17.09%)    233781.00 ( -3.43%)    250624.00 (  3.53%)
TPut 10    254513.00 (  0.00%)    236769.00 ( -6.97%)    256599.00 (  0.82%)    275834.00 (  8.38%)
TPut 11    283694.00 (  0.00%)    227999.00 (-19.63%)    281189.00 ( -0.88%)    300696.00 (  5.99%)
TPut 12    306679.00 (  0.00%)    263599.00 (-14.05%)    307239.00 (  0.18%)    325723.00 (  6.21%)
TPut 13    317050.00 (  0.00%)    281988.00 (-11.06%)    320474.00 (  1.08%)    346733.00 (  9.36%)
TPut 14    281122.00 (  0.00%)    306206.00 (  8.92%)    348007.00 ( 23.79%)    363974.00 ( 29.47%)
TPut 15    344584.00 (  0.00%)    327784.00 ( -4.88%)    370530.00 (  7.53%)    390804.00 ( 13.41%)
TPut 16    355251.00 (  0.00%)    325626.00 ( -8.34%)    388602.00 (  9.39%)    412690.00 ( 16.17%)
TPut 17    358785.00 (  0.00%)    372911.00 (  3.94%)    406725.00 ( 13.36%)    431710.00 ( 20.33%)
TPut 18    362037.00 (  0.00%)    358876.00 ( -0.87%)    423311.00 ( 16.92%)    447506.00 ( 23.61%)
TPut 19    366526.00 (  0.00%)    397926.00 (  8.57%)    434692.00 ( 18.60%)    454669.00 ( 24.05%)
TPut 20    365125.00 (  0.00%)    387871.00 (  6.23%)    441119.00 ( 20.81%)    475213.00 ( 30.15%)
TPut 21    367221.00 (  0.00%)    446595.00 ( 21.61%)    473582.00 ( 28.96%)    483085.00 ( 31.55%)
TPut 22    352732.00 (  0.00%)    436862.00 ( 23.85%)    479616.00 ( 35.97%)    494976.00 ( 40.33%)
TPut 23    358840.00 (  0.00%)    464554.00 ( 29.46%)    484157.00 ( 34.92%)    507236.00 ( 41.35%)
TPut 24    355426.00 (  0.00%)    474432.00 ( 33.48%)    477851.00 ( 34.44%)    503864.00 ( 41.76%)
TPut 25    354178.00 (  0.00%)    456845.00 ( 28.99%)    476411.00 ( 34.51%)    505628.00 ( 42.76%)
TPut 26    352844.00 (  0.00%)    477178.00 ( 35.24%)    474925.00 ( 34.60%)    496278.00 ( 40.65%)
TPut 27    351616.00 (  0.00%)    461061.00 ( 31.13%)    461218.00 ( 31.17%)    507777.00 ( 44.41%)
TPut 28    342442.00 (  0.00%)    458497.00 ( 33.89%)    442311.00 ( 29.16%)    495797.00 ( 44.78%)
TPut 29    330633.00 (  0.00%)    492795.00 ( 49.05%)    444804.00 ( 34.53%)    512545.00 ( 55.02%)
TPut 30    330202.00 (  0.00%)    503148.00 ( 52.38%)    428283.00 ( 29.70%)    494677.00 ( 49.81%)
TPut 31    318975.00 (  0.00%)    488421.00 ( 53.12%)    445121.00 ( 39.55%)    498506.00 ( 56.28%)
TPut 32    321422.00 (  0.00%)    469743.00 ( 46.15%)    437403.00 ( 36.08%)    490464.00 ( 52.59%)
TPut 33    322341.00 (  0.00%)    465564.00 ( 44.43%)    422936.00 ( 31.21%)    485365.00 ( 50.58%)
TPut 34    306767.00 (  0.00%)    462386.00 ( 50.73%)    407367.00 ( 32.79%)    467848.00 ( 52.51%)
TPut 35    304995.00 (  0.00%)    476963.00 ( 56.38%)    407555.00 ( 33.63%)    471954.00 ( 54.74%)
TPut 36    296795.00 (  0.00%)    455814.00 ( 53.58%)    403723.00 ( 36.03%)    467543.00 ( 57.53%)
TPut 37    295131.00 (  0.00%)    414467.00 ( 40.43%)    367104.00 ( 24.39%)    453145.00 ( 53.54%)
TPut 38    285609.00 (  0.00%)    418189.00 ( 46.42%)    357852.00 ( 25.29%)    436387.00 ( 52.79%)
TPut 39    288418.00 (  0.00%)    432818.00 ( 50.07%)    345127.00 ( 19.66%)    424866.00 ( 47.31%)
TPut 40    284779.00 (  0.00%)    416627.00 ( 46.30%)    330080.00 ( 15.91%)    429043.00 ( 50.66%)
TPut 41    275224.00 (  0.00%)    406106.00 ( 47.55%)    332766.00 ( 20.91%)    412042.00 ( 49.71%)
TPut 42    272301.00 (  0.00%)    387449.00 ( 42.29%)    330321.00 ( 21.31%)    409263.00 ( 50.30%)
TPut 43    261075.00 (  0.00%)    369755.00 ( 41.63%)    322081.00 ( 23.37%)    416906.00 ( 59.69%)
TPut 44    259570.00 (  0.00%)    383102.00 ( 47.59%)    310141.00 ( 19.48%)    401482.00 ( 54.67%)
TPut 45    268308.00 (  0.00%)    370866.00 ( 38.22%)    309946.00 ( 15.52%)    397084.00 ( 48.00%)
TPut 46    251641.00 (  0.00%)    371264.00 ( 47.54%)    308248.00 ( 22.50%)    367053.00 ( 45.86%)
TPut 47    248566.00 (  0.00%)    381703.00 ( 53.56%)    296089.00 ( 19.12%)    362150.00 ( 45.70%)
TPut 48    256403.00 (  0.00%)    392542.00 ( 53.10%)    302787.00 ( 18.09%)    368646.00 ( 43.78%)
TPut 49    252248.00 (  0.00%)    377276.00 ( 49.57%)    330756.00 ( 31.12%)    385558.00 ( 52.85%)
TPut 50    247856.00 (  0.00%)    351684.00 ( 41.89%)    344068.00 ( 38.82%)    373454.00 ( 50.67%)
TPut 51    251900.00 (  0.00%)    332813.00 ( 32.12%)    332706.00 ( 32.08%)    385786.00 ( 53.15%)
TPut 52    255247.00 (  0.00%)    373908.00 ( 46.49%)    338580.00 ( 32.65%)    357138.00 ( 39.92%)
TPut 53    254376.00 (  0.00%)    354872.00 ( 39.51%)    366606.00 ( 44.12%)    367391.00 ( 44.43%)
TPut 54    239804.00 (  0.00%)    375675.00 ( 56.66%)    347626.00 ( 44.96%)    387538.00 ( 61.61%)
TPut 55    243339.00 (  0.00%)    411901.00 ( 69.27%)    345700.00 ( 42.07%)    379513.00 ( 55.96%)
TPut 56    253604.00 (  0.00%)    379291.00 ( 49.56%)    366087.00 ( 44.35%)    367165.00 ( 44.78%)
TPut 57    238212.00 (  0.00%)    376023.00 ( 57.85%)    347698.00 ( 45.96%)    346641.00 ( 45.52%)
TPut 58    246397.00 (  0.00%)    399372.00 ( 62.08%)    372138.00 ( 51.03%)    377817.00 ( 53.34%)
TPut 59    244926.00 (  0.00%)    389607.00 ( 59.07%)    367619.00 ( 50.09%)    373928.00 ( 52.67%)
TPut 60    247249.00 (  0.00%)    382694.00 ( 54.78%)    339032.00 ( 37.12%)    377435.00 ( 52.65%)
TPut 61    249833.00 (  0.00%)    383316.00 ( 53.43%)    340934.00 ( 36.46%)    345885.00 ( 38.45%)
TPut 62    247309.00 (  0.00%)    390815.00 ( 58.03%)    345727.00 ( 39.80%)    359426.00 ( 45.33%)
TPut 63    246530.00 (  0.00%)    390800.00 ( 58.52%)    369327.00 ( 49.81%)    351243.00 ( 42.47%)
TPut 64    238954.00 (  0.00%)    404036.00 ( 69.09%)    359388.00 ( 50.40%)    354036.00 ( 48.16%)
TPut 65    245095.00 (  0.00%)    398807.00 ( 62.72%)    341462.00 ( 39.32%)    336288.00 ( 37.21%)
TPut 66    250698.00 (  0.00%)    387445.00 ( 54.55%)    352065.00 ( 40.43%)    374670.00 ( 49.45%)
TPut 67    235819.00 (  0.00%)    385050.00 ( 63.28%)    337617.00 ( 43.17%)    365777.00 ( 55.11%)
TPut 68    233949.00 (  0.00%)    372286.00 ( 59.13%)    365514.00 ( 56.24%)    344230.00 ( 47.14%)
TPut 69    229172.00 (  0.00%)    370092.00 ( 61.49%)    370106.00 ( 61.50%)    364038.00 ( 58.85%)
TPut 70    237174.00 (  0.00%)    375051.00 ( 58.13%)    366155.00 ( 54.38%)    351673.00 ( 48.28%)
TPut 71    235153.00 (  0.00%)    375629.00 ( 59.74%)    365557.00 ( 55.45%)    328308.00 ( 39.61%)
TPut 72    235747.00 (  0.00%)    356140.00 ( 51.07%)    378508.00 ( 60.56%)    334254.00 ( 41.79%)

numacore does not perform well here for low numbers of warehouses but rapidly
improves and by warehouse 18 is more or less level with the mainline kernel. After
that it improves quite dramatically. Note that specjbb reports on peak scores so
with THP enabled and a single JVM, numacore scores extremely well.

autonuma also regressed for lower number of warehouses in this run although
it is not clear why.  In 3.7-rc6, the same patch ashowed very small gains
flor lower number of warehouses. As with numacore it improves for larger
number of warehouses and starts improveing from warehouse 12 as opposed
to 18 for numacore.

balancenuma regressed a little initially but improves sooner and shows
respectable performance gains similar to numacore and autonuma for larger
numbers of warehouses.

SPECJBB PEAKS
                                   3.7.0-rc7                  3.7.0-rc6                  3.7.0-rc7                  3.7.0-rc7
                                 stats-v6r15          numacore-20121126         autonuma-v28fastr4            balancenuma-v8r6
 Expctd Warehouse            48.00 (  0.00%)            48.00 (  0.00%)            48.00 (  0.00%)            48.00 (  0.00%)
 Expctd Peak Bops        256403.00 (  0.00%)        392542.00 ( 53.10%)        302787.00 ( 18.09%)        368646.00 ( 43.78%)
 Actual Warehouse            21.00 (  0.00%)            30.00 ( 42.86%)            23.00 (  9.52%)            29.00 ( 38.10%)
 Actual Peak Bops        367221.00 (  0.00%)        503148.00 ( 37.02%)        484157.00 ( 31.84%)        512545.00 ( 39.57%)
 SpecJBB Bops            124837.00 (  0.00%)        193615.00 ( 55.09%)        179465.00 ( 43.76%)        184854.00 ( 48.08%)
 SpecJBB Bops/JVM        124837.00 (  0.00%)        193615.00 ( 55.09%)        179465.00 ( 43.76%)        184854.00 ( 48.08%)

Here you can see that numacore scales to a higher number of warehouses
and sees a 37.02% performance gain at the peak and a 55.09% gain on the
specjbb score. The peaks are great but not the results for smaller number
of warehouses. As specjbb scores based on the peak, be mindful of this.

autonuma sees a 31.84% performance gain at the peak and a 43.76%
performance gain on the specjbb score.

balancenuma gets a 39.57% performance gain at the peak and a 48.08%
gain on the specjbb score.

For larger numbers of warehouses, all three trees do extremely well.

MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
         stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
User       317746.38   311465.45   316147.49   315667.42
System         99.42     3043.75      355.53      459.73
Elapsed      7433.93     7436.53     7435.53     7433.49

Same comments about the system CPU usage. numacores is extremely high and
us using 6 times more CPU than balancenuma is.

MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
                           stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
Page Ins                         37060       36916       37072       33400
Page Outs                        59220       63380       57804       54436
Swap Ins                             0           0           0           0
Swap Outs                            0           0           0           0
Direct pages scanned                 0           0           0           0
Kswapd pages scanned                 0           0           0           0
Kswapd pages reclaimed               0           0           0           0
Direct pages reclaimed               0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%
Page writes by reclaim               0           0           0           0
Page writes file                     0           0           0           0
Page writes anon                     0           0           0           0
Page reclaim immediate               0           0           0           0
Page rescued immediate               0           0           0           0
Slabs scanned                        0           0           0           0
Direct inode steals                  0           0           0           0
Kswapd inode steals                  0           0           0           0
Kswapd skipped wait                  0           0           0           0
THP fault alloc                  53004       43971       51386       50126
THP collapse alloc                  67           1         192          58
THP splits                          82          39         107          77
THP fault fallback                   0           0           0           0
THP collapse fail                    0           0           0           0
Compaction stalls                    0           0           0           0
Compaction success                   0           0           0           0
Compaction failures                  0           0           0           0
Page migrate success                 0           0           0    47488580
Page migrate failure                 0           0           0           0
Compaction pages isolated            0           0           0           0
Compaction migrate scanned           0           0           0           0
Compaction free scanned              0           0           0           0
Compaction cost                      0           0           0       49293
NUMA PTE updates                     0           0           0   359807386
NUMA hint faults                     0           0           0     2024295
NUMA hint local faults               0           0           0      693439
NUMA pages migrated                  0           0           0    47488580
AutoNUMA cost                        0           0           0       13542

THP is in use. balancenuma migrated more than I'd like at an average
of 24M/sec.


SPECJBB: Single JVM, THP is disabled

                    3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7
                  stats-v6r15     numacore-20121126    autonuma-v28fastr4       balancenuma-v8r6
TPut 1      19264.00 (  0.00%)     17423.00 ( -9.56%)     18895.00 ( -1.92%)     19925.00 (  3.43%)
TPut 2      45341.00 (  0.00%)     38727.00 (-14.59%)     46448.00 (  2.44%)     47567.00 (  4.91%)
TPut 3      69495.00 (  0.00%)     58775.00 (-15.43%)     69639.00 (  0.21%)     72462.00 (  4.27%)
TPut 4      93336.00 (  0.00%)     71864.00 (-23.01%)     95667.00 (  2.50%)     97095.00 (  4.03%)
TPut 5     113997.00 (  0.00%)     98727.00 (-13.40%)    123262.00 (  8.13%)    121667.00 (  6.73%)
TPut 6     135278.00 (  0.00%)    111789.00 (-17.36%)    143619.00 (  6.17%)    144664.00 (  6.94%)
TPut 7     158037.00 (  0.00%)    119202.00 (-24.57%)    168299.00 (  6.49%)    169072.00 (  6.98%)
TPut 8     180282.00 (  0.00%)    124026.00 (-31.20%)    189608.00 (  5.17%)    186262.00 (  3.32%)
TPut 9     203033.00 (  0.00%)    128233.00 (-36.84%)    211492.00 (  4.17%)    207573.00 (  2.24%)
TPut 10    221732.00 (  0.00%)    139290.00 (-37.18%)    230843.00 (  4.11%)    232814.00 (  5.00%)
TPut 11    242479.00 (  0.00%)    127751.00 (-47.31%)    255217.00 (  5.25%)    255212.00 (  5.25%)
TPut 12    257236.00 (  0.00%)    149851.00 (-41.75%)    272681.00 (  6.00%)    259541.00 (  0.90%)
TPut 13    281727.00 (  0.00%)    163583.00 (-41.94%)    287647.00 (  2.10%)    299305.00 (  6.24%)
TPut 14    303538.00 (  0.00%)    142471.00 (-53.06%)    312506.00 (  2.95%)    316094.00 (  4.14%)
TPut 15    322025.00 (  0.00%)    127744.00 (-60.33%)    312595.00 ( -2.93%)    279241.00 (-13.29%)
TPut 16    336713.00 (  0.00%)    123808.00 (-63.23%)    335452.00 ( -0.37%)    307668.00 ( -8.63%)
TPut 17    356063.00 (  0.00%)    111864.00 (-68.58%)    225754.00 (-36.60%)    355818.00 ( -0.07%)
TPut 18    371661.00 (  0.00%)    147370.00 (-60.35%)    360233.00 ( -3.07%)    372634.00 (  0.26%)
TPut 19    379312.00 (  0.00%)    123923.00 (-67.33%)    387282.00 (  2.10%)    361767.00 ( -4.63%)
TPut 20    401692.00 (  0.00%)    138242.00 (-65.59%)    404094.00 (  0.60%)    423420.00 (  5.41%)
TPut 21    414513.00 (  0.00%)    130297.00 (-68.57%)    407778.00 ( -1.62%)    391592.00 ( -5.53%)
TPut 22    428844.00 (  0.00%)    137265.00 (-67.99%)    417451.00 ( -2.66%)    405080.00 ( -5.54%)
TPut 23    438020.00 (  0.00%)    142830.00 (-67.39%)    429879.00 ( -1.86%)    408552.00 ( -6.73%)
TPut 24    448953.00 (  0.00%)    134555.00 (-70.03%)    438014.00 ( -2.44%)    437712.00 ( -2.50%)
TPut 25    435304.00 (  0.00%)    139353.00 (-67.99%)    421593.00 ( -3.15%)    434468.00 ( -0.19%)
TPut 26    440650.00 (  0.00%)    138950.00 (-68.47%)    431110.00 ( -2.16%)    470865.00 (  6.86%)
TPut 27    450883.00 (  0.00%)    122023.00 (-72.94%)    363860.00 (-19.30%)    454628.00 (  0.83%)
TPut 28    443898.00 (  0.00%)    147767.00 (-66.71%)    432948.00 ( -2.47%)    435056.00 ( -1.99%)
TPut 29    441452.00 (  0.00%)    146533.00 (-66.81%)    424264.00 ( -3.89%)    428605.00 ( -2.91%)
TPut 30    441326.00 (  0.00%)    151533.00 (-65.66%)    422050.00 ( -4.37%)    460991.00 (  4.46%)
TPut 31    439690.00 (  0.00%)    153500.00 (-65.09%)    414679.00 ( -5.69%)    434294.00 ( -1.23%)
TPut 32    429590.00 (  0.00%)    157455.00 (-63.35%)    419414.00 ( -2.37%)    428349.00 ( -0.29%)
TPut 33    417133.00 (  0.00%)    144792.00 (-65.29%)    416503.00 ( -0.15%)    417916.00 (  0.19%)
TPut 34    420403.00 (  0.00%)    145986.00 (-65.27%)    405824.00 ( -3.47%)    433001.00 (  3.00%)
TPut 35    416891.00 (  0.00%)    147549.00 (-64.61%)    403946.00 ( -3.11%)    442290.00 (  6.09%)
TPut 36    408666.00 (  0.00%)    148456.00 (-63.67%)    407079.00 ( -0.39%)    394163.00 ( -3.55%)
TPut 37    404101.00 (  0.00%)    155440.00 (-61.53%)    388615.00 ( -3.83%)    402274.00 ( -0.45%)
TPut 38    388909.00 (  0.00%)    160695.00 (-58.68%)    394499.00 (  1.44%)    427483.00 (  9.92%)
TPut 39    383162.00 (  0.00%)    152452.00 (-60.21%)    375101.00 ( -2.10%)    390608.00 (  1.94%)
TPut 40    370984.00 (  0.00%)    165686.00 (-55.34%)    374385.00 (  0.92%)    377252.00 (  1.69%)
TPut 41    370755.00 (  0.00%)    164312.00 (-55.68%)    370951.00 (  0.05%)    375261.00 (  1.22%)
TPut 42    356921.00 (  0.00%)    168220.00 (-52.87%)    365286.00 (  2.34%)    361267.00 (  1.22%)
TPut 43    346752.00 (  0.00%)    164975.00 (-52.42%)    348567.00 (  0.52%)    402065.00 ( 15.95%)
TPut 44    333574.00 (  0.00%)    155288.00 (-53.45%)    346565.00 (  3.89%)    359868.00 (  7.88%)
TPut 45    330858.00 (  0.00%)    158725.00 (-52.03%)    359029.00 (  8.51%)    355606.00 (  7.48%)
TPut 46    324668.00 (  0.00%)    163932.00 (-49.51%)    351591.00 (  8.29%)    375223.00 ( 15.57%)
TPut 47    317691.00 (  0.00%)    154329.00 (-51.42%)    353301.00 ( 11.21%)    355017.00 ( 11.75%)
TPut 48    323505.00 (  0.00%)    159024.00 (-50.84%)    344156.00 (  6.38%)    372821.00 ( 15.24%)
TPut 49    323870.00 (  0.00%)    142198.00 (-56.09%)    349592.00 (  7.94%)    370188.00 ( 14.30%)
TPut 50    332865.00 (  0.00%)    133112.00 (-60.01%)    355565.00 (  6.82%)    366131.00 (  9.99%)
TPut 51    325322.00 (  0.00%)    139628.00 (-57.08%)    355764.00 (  9.36%)    354747.00 (  9.04%)
TPut 52    326365.00 (  0.00%)    144885.00 (-55.61%)    364997.00 ( 11.84%)    358001.00 (  9.69%)
TPut 53    312548.00 (  0.00%)    167534.00 (-46.40%)    370090.00 ( 18.41%)    360848.00 ( 15.45%)
TPut 54    324755.00 (  0.00%)    170174.00 (-47.60%)    373291.00 ( 14.95%)    362261.00 ( 11.55%)
TPut 55    317938.00 (  0.00%)    177956.00 (-44.03%)    375091.00 ( 17.98%)    344495.00 (  8.35%)
TPut 56    326050.00 (  0.00%)    178906.00 (-45.13%)    375465.00 ( 15.16%)    369663.00 ( 13.38%)
TPut 57    302538.00 (  0.00%)    176488.00 (-41.66%)    372899.00 ( 23.26%)    366090.00 ( 21.01%)
TPut 58    314612.00 (  0.00%)    175755.00 (-44.14%)    385492.00 ( 22.53%)    354818.00 ( 12.78%)
TPut 59    312258.00 (  0.00%)    170366.00 (-45.44%)    383785.00 ( 22.91%)    373003.00 ( 19.45%)
TPut 60    317391.00 (  0.00%)    171247.00 (-46.05%)    379551.00 ( 19.58%)    365024.00 ( 15.01%)
TPut 61    289702.00 (  0.00%)    171227.00 (-40.90%)    373473.00 ( 28.92%)    368090.00 ( 27.06%)
TPut 62    314272.00 (  0.00%)    170611.00 (-45.71%)    369686.00 ( 17.63%)    367854.00 ( 17.05%)
TPut 63    318831.00 (  0.00%)    170379.00 (-46.56%)    367372.00 ( 15.22%)    372475.00 ( 16.83%)
TPut 64    304071.00 (  0.00%)    167930.00 (-44.77%)    368247.00 ( 21.11%)    370133.00 ( 21.73%)
TPut 65    294689.00 (  0.00%)    170535.00 (-42.13%)    361717.00 ( 22.75%)    363054.00 ( 23.20%)
TPut 66    309932.00 (  0.00%)    168917.00 (-45.50%)    356749.00 ( 15.11%)    351800.00 ( 13.51%)
TPut 67    309109.00 (  0.00%)    168709.00 (-45.42%)    366841.00 ( 18.68%)    366473.00 ( 18.56%)
TPut 68    307969.00 (  0.00%)    167717.00 (-45.54%)    345216.00 ( 12.09%)    372904.00 ( 21.08%)
TPut 69    315208.00 (  0.00%)    165794.00 (-47.40%)    367136.00 ( 16.47%)    354816.00 ( 12.57%)
TPut 70    310438.00 (  0.00%)    166529.00 (-46.36%)    364421.00 ( 17.39%)    362567.00 ( 16.79%)
TPut 71    304885.00 (  0.00%)    165862.00 (-45.60%)    357377.00 ( 17.22%)    355774.00 ( 16.69%)
TPut 72    304734.00 (  0.00%)    165487.00 (-45.69%)    331900.00 (  8.91%)    348366.00 ( 14.32%)

Without THP, numacore suffers really badly. In an earlier run against
3.7-rc6, autonuma and balancenuma also did not do great but autonuma did
quite well this time with the same patch so something significant may have
changed between 3.7-rc6 and 3.7-rc7.  balancenuma also did reasonably well
this time when it showed flat performance the last time. It has changed,
but mostly in how it treats THP which should not have affected this result.
Tip was based on 3.7-rc6 this time but maybe it'll benefit from the same
mystery change in 3.7-rc7 when it's tested.

So, while balancenuma did well here it's worth noting that if it continually
migrates then its scan rate does not drop and it incurs a higher system
CPU cost. It did not happen here but is worth bearing in mind.

SPECJBB PEAKS
                                   3.7.0-rc7                  3.7.0-rc6                  3.7.0-rc7                  3.7.0-rc7
                                 stats-v6r15          numacore-20121126         autonuma-v28fastr4            balancenuma-v8r6
 Expctd Warehouse            48.00 (  0.00%)            48.00 (  0.00%)            48.00 (  0.00%)            48.00 (  0.00%)
 Expctd Peak Bops        323505.00 (  0.00%)        159024.00 (-50.84%)        344156.00 (  6.38%)        372821.00 ( 15.24%)
 Actual Warehouse            27.00 (  0.00%)            56.00 (107.41%)            24.00 (-11.11%)            26.00 ( -3.70%)
 Actual Peak Bops        450883.00 (  0.00%)        178906.00 (-60.32%)        438014.00 ( -2.85%)        470865.00 (  4.43%)
 SpecJBB Bops            160079.00 (  0.00%)         84224.00 (-47.39%)        186038.00 ( 16.22%)        185151.00 ( 15.66%)
 SpecJBB Bops/JVM        160079.00 (  0.00%)         84224.00 (-47.39%)        186038.00 ( 16.22%)        185151.00 ( 15.66%)

numacore regressed 60.32% at the peak and has a 47.39% loss on its specjbb
score.

autonuma regresses 2.85% at its peak but gained 16.22% on its overall
specjbb score.

balancenuma does gained 4.43 at its peak and a 15.66% on its overall score.


MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
         stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
User       317176.63   168175.82   308607.83   308503.96
System         60.85   119763.49     3974.78     1879.45
Elapsed      7434.09     7451.39     7437.49     7437.41

numacores system CPU usage is excessive.

autonumas is high here as well and that's even with the kernel threads.

balancenumas is also higher than I'd like but it's the best of the three
trees.

MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
                           stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
Page Ins                         62572       36844       37132       37100
Page Outs                        60448       62928       58464       59028
Swap Ins                             0           0           0           0
Swap Outs                            0           0           0           0
Direct pages scanned                 0           0           0           0
Kswapd pages scanned                 0           0           0           0
Kswapd pages reclaimed               0           0           0           0
Direct pages reclaimed               0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%
Page writes by reclaim               0           0           0           0
Page writes file                     0           0           0           0
Page writes anon                     0           0           0           0
Page reclaim immediate               0           0           0           0
Page rescued immediate               0           0           0           0
Slabs scanned                        0           0           0           0
Direct inode steals                  0           0           0           0
Kswapd inode steals                  0           0           0           0
Kswapd skipped wait                  0           0           0           0
THP fault alloc                      3           3           3           3
THP collapse alloc                   0           0          12           0
THP splits                           0           0           0           0
THP fault fallback                   0           0           0           0
THP collapse fail                    0           0           0           0
Compaction stalls                    0           0           0           0
Compaction success                   0           0           0           0
Compaction failures                  0           0           0           0
Page migrate success                 0           0           0    25255063
Page migrate failure                 0           0           0           0
Compaction pages isolated            0           0           0           0
Compaction migrate scanned           0           0           0           0
Compaction free scanned              0           0           0           0
Compaction cost                      0           0           0       26214
NUMA PTE updates                     0           0           0   206844060
NUMA hint faults                     0           0           0   201377412
NUMA hint local faults               0           0           0    51864509
NUMA pages migrated                  0           0           0    25255063
AutoNUMA cost                        0           0           0     1008814

THP is not in use. Migrations for balancenuma were at 13MB/sec which is better
than has been seen before but should still be lower.


Next I ran NPB (http://www.nas.nasa.gov/publications/npb.htm) as an
example of a workload of interest to HPC. I made little or no attempt to
be clever here. Defaults were used instead of trying to tune to achieve
peak performance. I used the Class C problem set size as Class D was being
pushed to swap on my machine. This means that the benchmark is not using that
much memory but it will be using a lot of the CPUs so it is still useful.

For MPI, it is mostly process based and running in local mode was using
large files in /tmp/ to communicate. So it's using shared memory but not
system V shmem.

OpenMP is thread based.

I analysed neither set of workloads closely. It was just a blind punt.

NAS MPI
                      3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7
                    stats-v6r15     numacore-20121126    autonuma-v28fastr4       balancenuma-v8r6
Time cg.C       59.92 (  0.00%)       56.59 (  5.56%)       58.66 (  2.10%)       53.58 ( 10.58%)
Time ep.C       18.07 (  0.00%)       18.96 ( -4.93%)       18.12 ( -0.28%)       18.86 ( -4.37%)
Time ft.C       51.57 (  0.00%)       53.67 ( -4.07%)       53.60 ( -3.94%)       51.81 ( -0.47%)
Time is.C        2.85 (  0.00%)        4.19 (-47.02%)        3.26 (-14.39%)        3.34 (-17.19%)
Time lu.C      160.07 (  0.00%)      142.26 ( 11.13%)      138.43 ( 13.52%)      139.71 ( 12.72%)
Time mg.C       24.46 (  0.00%)       23.57 (  3.64%)       24.71 ( -1.02%)       22.73 (  7.07%)

Everyone regressed on is.C and ep.C which are very short-lived. mg.C showed
gains and losses but again is very short-lived. Of what's left

cg.C	balancenuma best but not by that great a margin
ft.C	balancenuma "best" by a small margin and is close to mainline
lu.C	autonuma    best by a small margin
mg.C    balancenuma best by a small margin

The differences between the trees is not massive any may be within the noise.
The fact is that the tests are too short-lived to be really useful. It's a
pity that class D is not usable on this machine because it starts using swap.
I'll investigate if something can be done about that.

MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
         stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
User         8279.08     7415.87     7564.98     7427.82
System       2309.04     2608.66     2432.62     2306.59
Elapsed       366.62      350.35      349.25      341.20

numacore is a bit high on the system CPU usage side but not as excessive
as it can be.

MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
                           stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
Page Ins                         33256       36576       36448       36508
Page Outs                       732304      832596      745144      590296
Swap Ins                             0           0           0           0
Swap Outs                            0           0           0           0
Direct pages scanned                 0           0           0           0
Kswapd pages scanned                 0           0           0           0
Kswapd pages reclaimed               0           0           0           0
Direct pages reclaimed               0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%
Page writes by reclaim               0           0           0           0
Page writes file                     0           0           0           0
Page writes anon                     0           0           0           0
Page reclaim immediate               0           0           0           0
Page rescued immediate               0           0           0           0
Slabs scanned                        0           0           0           0
Direct inode steals                  0           0           0           0
Kswapd inode steals                  0           0           0           0
Kswapd skipped wait                  0           0           0           0
THP fault alloc                   7532        7524        7526        7530
THP collapse alloc                  19           0         100          21
THP splits                           0           0           8           1
THP fault fallback                   0           0           0           0
THP collapse fail                    0           0           0           0
Compaction stalls                    0           0           0           0
Compaction success                   0           0           0           0
Compaction failures                  0           0           0           0
Page migrate success                 0           0           0     1954996
Page migrate failure                 0           0           0           0
Compaction pages isolated            0           0           0           0
Compaction migrate scanned           0           0           0           0
Compaction free scanned              0           0           0           0
Compaction cost                      0           0           0        2029
NUMA PTE updates                     0           0           0   106542884
NUMA hint faults                     0           0           0     2634360
NUMA hint local faults               0           0           0     2385326
NUMA pages migrated                  0           0           0     1954996
AutoNUMA cost                        0           0           0       13954

THP was in use but otherwise it's hard to conclude anything useful. Each
workload is very different so we cannot draw reasonable conclusions from
the amount of data migrated.

NAS OMP

                      3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7
                    stats-v6r15     numacore-20121126    autonuma-v28fastr4       balancenuma-v8r6
Time bt.C      167.76 (  0.00%)      189.34 (-12.86%)      166.28 (  0.88%)      169.68 ( -1.14%)
Time cg.C       44.52 (  0.00%)       61.84 (-38.90%)       52.11 (-17.05%)       46.71 ( -4.92%)
Time ep.C       12.66 (  0.00%)       15.41 (-21.72%)       12.35 (  2.45%)       12.21 (  3.55%)
Time ft.C       32.55 (  0.00%)       37.77 (-16.04%)       35.21 ( -8.17%)       32.85 ( -0.92%)
Time is.C        1.69 (  0.00%)        2.28 (-34.91%)        1.95 (-15.38%)        1.68 (  0.59%)
Time lu.C       88.12 (  0.00%)      135.42 (-53.68%)      120.73 (-37.01%)       91.07 ( -3.35%)
Time mg.C       26.62 (  0.00%)       33.15 (-24.53%)       29.07 ( -9.20%)       28.08 ( -5.48%)
Time sp.C      783.74 (  0.00%)      450.35 ( 42.54%)      384.51 ( 50.94%)      413.22 ( 47.28%)
Time ua.C      201.91 (  0.00%)      173.32 ( 14.16%)      187.70 (  7.04%)      172.80 ( 14.42%)

Note that OpenMP runs more tests. At some time in the past, the equivalent
tests were not compiling for OpenMPI and the MMTests script does not even try
and run time. I'll recheck if this is still the case of if it can be fixed.

numacore and autonuma did really badly on lu.C, worth looking at what that
benchmark is doing. balancenuma looks like it did ok but am cautious about it
and would prefer it if was more than once.

Otherwise, numacore regressed a number of the remaining tests but
saw large gains for sp and ua.

autonuma fares much better but there are large regressions there too.

balancenuma did ok. Generally though, this series of benchmark has issued
a few challenges that will need to be answered.

MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
         stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
User        60286.11    46017.38    41803.90    42021.18
System         68.02     1430.31      118.75      166.79
Elapsed      1495.34     1236.03     1131.33     1103.99

numacores system CPU usage is comparatively very high again.

MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7
                           stats-v6r15numacore-20121126autonuma-v28fastr4balancenuma-v8r6
Page Ins                         37544       37288       37428       37404
Page Outs                        19240       17908       17244       17600
Swap Ins                             0           0           0           0
Swap Outs                            0           0           0           0
Direct pages scanned                 0           0           0           0
Kswapd pages scanned                 0           0           0           0
Kswapd pages reclaimed               0           0           0           0
Direct pages reclaimed               0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%
Page writes by reclaim               0           0           0           0
Page writes file                     0           0           0           0
Page writes anon                     0           0           0           0
Page reclaim immediate               0           0           0           0
Page rescued immediate               0           0           0           0
Slabs scanned                        0           0           0           0
Direct inode steals                  0           0           0           0
Kswapd inode steals                  0           0           0           0
Kswapd skipped wait                  0           0           0           0
THP fault alloc                  15700       15798       15495       15696
THP collapse alloc                  13           2          98           8
THP splits                           0           0           2           1
THP fault fallback                   0           0           0           0
THP collapse fail                    0           0           0           0
Compaction stalls                    0           0           0           0
Compaction success                   0           0           0           0
Compaction failures                  0           0           0           0
Page migrate success                 0           0           0     2814591
Page migrate failure                 0           0           0           0
Compaction pages isolated            0           0           0           0
Compaction migrate scanned           0           0           0           0
Compaction free scanned              0           0           0           0
Compaction cost                      0           0           0        2921
NUMA PTE updates                     0           0           0    49389870
NUMA hint faults                     0           0           0     1575920
NUMA hint local faults               0           0           0      961230
NUMA pages migrated                  0           0           0     2814591
AutoNUMA cost                        0           0           0        8278

THP is in use but as each workload is very different we cannot really draw
sensible conclusions from the other stats.

Finally, the following are just rudimentary tests to check some basics. I'm
not going into heavy details this time because the figures look very similar to
the previous report

kernbench	- numacore    -2.50%
		  autonuma    -0.49%
		  balancenuma -0.60%

aim9		- everyone ok
hackbench-pipes	- same as before. numacore, balancenuma ok. autonuma regressed heavily
hackbench-socket- same
pft		- same as before. numacore, balancenuma ok. autonuma high system CPU usage
		  similar with fault rates. numacore, balancenuma ok. autonuma regresses heavily

There you have it. Some good results, some great, some bad results, some
disastrous. Of course this is for only one machine and other machines
might report differently.

numacore does very well with THP enabled on a single JVM for specjbb
and does very well for an adverse workload in autonumabench. However,
in other benchmarks it can regress heavily and it's system CPU usage can
be excessive. I'm still of the opinion that it should be rebased on top
of balancenuma and evaulated against it.

autonuma does very well in a number of configurations but there are too
many people unhappy with how it integrates with the core kernel. It would
also be nice if the placement policies part could be rebased on top of
balancenuma where it could get a fair like-like comparison with numacore.

balancenuma did pretty well overall. It generally was an improvement on
the baseline kernel but there are cases where it could really benefit
from a placement policy on top that could place the memory and quickly
reduce the PTE scan rates and number of migrations. I think it's the best
starting point we have available right now.

Comments?

-- 
Mel Gorman
SUSE Labs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/