[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121115100805.GS8218@suse.de>
Date: Thu, 15 Nov 2012 10:08:05 +0000
From: Mel Gorman <mgorman@...e.de>
To: Ingo Molnar <mingo@...nel.org>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Paul Turner <pjt@...gle.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Christoph Lameter <cl@...ux.com>,
Rik van Riel <riel@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Benchmark results: "Enhanced NUMA scheduling with adaptive
affinity"
On Mon, Nov 12, 2012 at 07:48:33PM +0100, Ingo Molnar wrote:
>
> * Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
>
> > Hi,
> >
> > This series implements an improved version of NUMA scheduling,
> > based on the review and testing feedback we got.
> >
> > [...]
> >
> > This new scheduler code is then able to group tasks that are
> > "memory related" via their memory access patterns together: in
> > the NUMA context moving them on the same node if possible, and
> > spreading them amongst nodes if they use private memory.
>
> Here are some preliminary performance figures, comparing the
> vanilla kernel against the CONFIG_SCHED_NUMA=y kernel.
>
> Java SPEC benchmark, running on a 4 node, 64 GB, 32-way server
> system (higher numbers are better):
>
Ok, I used a 4-node, 64G, 48-way server system. We have different CPUs
but the same number of nodes. In case it makes a difference each of my
machines nodes are the same size.
> v3.7-vanilla: run #1: 475630
> run #2: 538271
> run #3: 533888
> run #4: 431525
> ----------------------------------
> avg: 494828 transactions/sec
>
> v3.7-NUMA: run #1: 626692
> run #2: 622069
> run #3: 630335
> run #4: 629817
> ----------------------------------
> avg: 627228 transactions/sec [ +26.7% ]
>
> Beyond the +26.7% performance improvement in throughput, the
> standard deviation of the results is much lower as well with
> NUMA scheduling enabled, by about an order of magnitude.
>
> [ That is probably so because memory and task placement is more
> balanced with NUMA scheduling enabled - while with the vanilla
> kernel initial placement of the working set determines the
> final performance figure. ]
>
I did not see the same results. I used 3.7-rc4 as a baseline as it's what
I'm developing against currently. For your patches I pulled tip/sched/core
and then applied the patches you posted to the mailing list on top. It
means my tree looks different to yours but it was necessary if I was going
to do a like-with-like comparison. I also rebased Andrea'a autonuma28fast
branch from his git tree onto 3.7-rc4 (some mess, but nothing very serious).
As before, I'm cutting this report short
SPECJBB BOPS
3.7.0 3.7.0 3.7.0
rc4-stats-v2r34 rc4-schednuma-v2r3 rc4-autonuma-v28fast
Mean 1 25034.25 ( 0.00%) 20598.50 (-17.72%) 25192.25 ( 0.63%)
Mean 2 53176.00 ( 0.00%) 43906.50 (-17.43%) 55508.25 ( 4.39%)
Mean 3 77350.50 ( 0.00%) 60342.75 (-21.99%) 82122.50 ( 6.17%)
Mean 4 99919.50 ( 0.00%) 80781.75 (-19.15%) 107233.25 ( 7.32%)
Mean 5 119797.00 ( 0.00%) 97870.00 (-18.30%) 131016.00 ( 9.37%)
Mean 6 135858.00 ( 0.00%) 123912.50 ( -8.79%) 152444.75 ( 12.21%)
Mean 7 136074.00 ( 0.00%) 126574.25 ( -6.98%) 157372.75 ( 15.65%)
Mean 8 132426.25 ( 0.00%) 121766.00 ( -8.05%) 161655.25 ( 22.07%)
Mean 9 129432.75 ( 0.00%) 114224.25 (-11.75%) 160530.50 ( 24.03%)
Mean 10 118399.75 ( 0.00%) 109040.50 ( -7.90%) 158692.00 ( 34.03%)
Mean 11 119604.00 ( 0.00%) 105566.50 (-11.74%) 154462.00 ( 29.14%)
Mean 12 112742.25 ( 0.00%) 101728.75 ( -9.77%) 149546.00 ( 32.64%)
Mean 13 109480.75 ( 0.00%) 103737.50 ( -5.25%) 144929.25 ( 32.38%)
Mean 14 109724.00 ( 0.00%) 103516.00 ( -5.66%) 143804.50 ( 31.06%)
Mean 15 109111.75 ( 0.00%) 100817.00 ( -7.60%) 141878.00 ( 30.03%)
Mean 16 105385.75 ( 0.00%) 99327.25 ( -5.75%) 140156.75 ( 32.99%)
Mean 17 101903.50 ( 0.00%) 96464.50 ( -5.34%) 138402.00 ( 35.82%)
Mean 18 103632.50 ( 0.00%) 95632.50 ( -7.72%) 137781.50 ( 32.95%)
Stddev 1 1195.76 ( 0.00%) 358.07 ( 70.06%) 861.97 ( 27.91%)
Stddev 2 883.39 ( 0.00%) 1203.29 (-36.21%) 855.08 ( 3.20%)
Stddev 3 997.25 ( 0.00%) 3755.67 (-276.60%) 545.50 ( 45.30%)
Stddev 4 1115.16 ( 0.00%) 6390.65 (-473.07%) 1183.49 ( -6.13%)
Stddev 5 1367.09 ( 0.00%) 9710.70 (-610.32%) 1022.09 ( 25.24%)
Stddev 6 1125.22 ( 0.00%) 1097.83 ( 2.43%) 1013.52 ( 9.93%)
Stddev 7 3211.72 ( 0.00%) 1533.62 ( 52.25%) 512.61 ( 84.04%)
Stddev 8 4194.96 ( 0.00%) 1518.26 ( 63.81%) 493.64 ( 88.23%)
Stddev 9 6175.10 ( 0.00%) 2648.75 ( 57.11%) 2109.83 ( 65.83%)
Stddev 10 4754.87 ( 0.00%) 1941.47 ( 59.17%) 2948.98 ( 37.98%)
Stddev 11 2706.18 ( 0.00%) 1247.95 ( 53.89%) 5907.16 (-118.28%)
Stddev 12 3607.76 ( 0.00%) 663.63 ( 81.61%) 9063.28 (-151.22%)
Stddev 13 2771.67 ( 0.00%) 1447.87 ( 47.76%) 8716.51 (-214.49%)
Stddev 14 2522.18 ( 0.00%) 1510.28 ( 40.12%) 9286.98 (-268.21%)
Stddev 15 2711.16 ( 0.00%) 1719.54 ( 36.58%) 9895.88 (-265.01%)
Stddev 16 2797.21 ( 0.00%) 983.63 ( 64.84%) 9302.92 (-232.58%)
Stddev 17 4019.85 ( 0.00%) 1927.25 ( 52.06%) 9998.34 (-148.72%)
Stddev 18 3332.20 ( 0.00%) 1401.68 ( 57.94%) 12056.08 (-261.80%)
TPut 1 100137.00 ( 0.00%) 82394.00 (-17.72%) 100769.00 ( 0.63%)
TPut 2 212704.00 ( 0.00%) 175626.00 (-17.43%) 222033.00 ( 4.39%)
TPut 3 309402.00 ( 0.00%) 241371.00 (-21.99%) 328490.00 ( 6.17%)
TPut 4 399678.00 ( 0.00%) 323127.00 (-19.15%) 428933.00 ( 7.32%)
TPut 5 479188.00 ( 0.00%) 391480.00 (-18.30%) 524064.00 ( 9.37%)
TPut 6 543432.00 ( 0.00%) 495650.00 ( -8.79%) 609779.00 ( 12.21%)
TPut 7 544296.00 ( 0.00%) 506297.00 ( -6.98%) 629491.00 ( 15.65%)
TPut 8 529705.00 ( 0.00%) 487064.00 ( -8.05%) 646621.00 ( 22.07%)
TPut 9 517731.00 ( 0.00%) 456897.00 (-11.75%) 642122.00 ( 24.03%)
TPut 10 473599.00 ( 0.00%) 436162.00 ( -7.90%) 634768.00 ( 34.03%)
TPut 11 478416.00 ( 0.00%) 422266.00 (-11.74%) 617848.00 ( 29.14%)
TPut 12 450969.00 ( 0.00%) 406915.00 ( -9.77%) 598184.00 ( 32.64%)
TPut 13 437923.00 ( 0.00%) 414950.00 ( -5.25%) 579717.00 ( 32.38%)
TPut 14 438896.00 ( 0.00%) 414064.00 ( -5.66%) 575218.00 ( 31.06%)
TPut 15 436447.00 ( 0.00%) 403268.00 ( -7.60%) 567512.00 ( 30.03%)
TPut 16 421543.00 ( 0.00%) 397309.00 ( -5.75%) 560627.00 ( 32.99%)
TPut 17 407614.00 ( 0.00%) 385858.00 ( -5.34%) 553608.00 ( 35.82%)
TPut 18 414530.00 ( 0.00%) 382530.00 ( -7.72%) 551126.00 ( 32.95%)
It is important to know how this was configured. I was running one JVM
per node and the JVMs were sized that they should fit in the node. This
is a semi-ideal configuration because it could also be hard-bound for
best performance on the vanilla kernel. You did not say if you ran with
a single JVM or multiple JVMs and it's important.
The mean values are based on the individual throughput figures reported
by each JVM. schednuma regresses against mainline quite badly. For low
numbers of warehouses it also deviates more but it's much steadier for
higher numbers of warehouses. In terms of overall throughput though,
it's worse.
autonuma deviates a *lot* with massive variances between the JVMs.
However, the average and total throughput is very high.
SPECJBB PEAKS
3.7.0 3.7.0 3.7.0
rc4-stats-v2r34 rc4-schednuma-v2r3 rc4-autonuma-v28fast
Expctd Warehouse 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%)
Expctd Peak Bops 450969.00 ( 0.00%) 406915.00 ( -9.77%) 598184.00 ( 32.64%)
Actual Warehouse 7.00 ( 0.00%) 7.00 ( 0.00%) 8.00 ( 14.29%)
Actual Peak Bops 544296.00 ( 0.00%) 506297.00 ( -6.98%) 646621.00 ( 18.80%)
There is no major difference in terms of scalability. They peak at
around the 7 warehouse mark. autonuma peaked at 8 but you can see from
the figures that it was not by a whole lot. autonumas actual peak
operations was very high (18% gain) where schednuma regressed by close
to 7%.
MMTests Statistics: duration
3.7.0 3.7.0 3.7.0
rc4-stats-v2r34rc4-schednuma-v2r3rc4-autonuma-v28fast
User 101949.84 86817.79 101748.80
System 66.05 13094.99 191.40
Elapsed 2456.35 2459.16 2451.96
system CPU time is high for schednuma. autonuma reports low system CPU
usage but as it is using kernel threads for much of its work, it cannot
be considered reliable as it would not be captured here.
> I've also tested Andrea's 'autonumabench' benchmark suite
> against vanilla and the NUMA kernel, because Mel reported that
> the CONFIG_SCHED_NUMA=y code regressed. It does not regress
> anymore:
>
> #
> # NUMA01
> #
> perf stat --null --repeat 3 ./numa01
>
> v3.7-vanilla: 340.3 seconds ( +/- 0.31% )
> v3.7-NUMA: 216.9 seconds [ +56% ] ( +/- 8.32% )
> -------------------------------------
> v3.7-HARD_BIND: 166.6 seconds
>
> Here the new NUMA code is faster than vanilla by 56% - that is
> because with the vanilla kernel all memory is allocated on
> node0, overloading that node's memory bandwidth.
>
> [ Standard deviation on the vanilla kernel is low, because the
> autonuma test causes close to the worst-case placement for the
> vanilla kernel - and there's not much space to deviate away
> from the worst-case. Despite that, stddev in the NUMA seems a
> tad high, suggesting further room for improvement. ]
>
For machines with more than 2 nodes, numa01 is an adverse workload.
> #
> # NUMA01_THREAD_ALLOC
> #
> perf stat --null --repeat 3 ./numa01_THREAD_ALLOC
>
> v3.7-vanilla: 425.1 seconds ( +/- 1.04% )
> v3.7-NUMA: 118.7 seconds [ +250% ] ( +/- 0.49% )
> -------------------------------------
> v3.7-HARD_BIND: 200.56 seconds
>
> Here the NUMA kernel was able to go beyond the (naive)
> hard-binding result and achieved 3.5x the performance of the
> vanilla kernel, with a low stddev.
>
> #
> # NUMA02
> #
> perf stat --null --repeat 3 ./numa02
>
> v3.7-vanilla: 56.1 seconds ( +/- 0.72% )
> v3.7-NUMA: 17.0 seconds [ +230% ] ( +/- 0.18% )
> -------------------------------------
> v3.7-HARD_BIND: 14.9 seconds
>
> Here the NUMA kernel runs the test much (3.3x) faster than the
> vanilla kernel. The workload is able to converge very quickly
> and approximate the hard-binding ideal number very closely. If
> runtime was a bit longer it would approximate it even closer.
>
> Standard deviation is also 3 times lower than vanilla,
> suggesting stable NUMA convergence.
>
> #
> # NUMA02_SMT
> #
> perf stat --null --repeat 3 ./numa02_SMT
> v3.7-vanilla: 56.1 seconds ( +- 0.42% )
> v3.7-NUMA: 17.3 seconds [ +220% ] ( +- 0.88% )
> -------------------------------------
> v3.7-HARD_BIND: 14.6 seconds
>
> In this test too the NUMA kernel outperforms the vanilla kernel,
> by a factor of 3.2x. It comes very close to the ideal
> hard-binding convergence result. Standard deviation is a bit
> high.
>
With this benchark, I'm generally seeing very good results in terms of
elapsed time.
AUTONUMA BENCH
3.7.0 3.7.0 3.7.0
rc4-stats-v2r34 rc4-schednuma-v2r3 rc4-autonuma-v28fast
User NUMA01 67351.66 ( 0.00%) 47146.57 ( 30.00%) 30273.64 ( 55.05%)
User NUMA01_THEADLOCAL 54788.28 ( 0.00%) 17198.99 ( 68.61%) 17039.73 ( 68.90%)
User NUMA02 7179.87 ( 0.00%) 2096.07 ( 70.81%) 2099.85 ( 70.75%)
User NUMA02_SMT 3028.11 ( 0.00%) 998.22 ( 67.03%) 1052.97 ( 65.23%)
System NUMA01 45.68 ( 0.00%) 3531.04 (-7629.95%) 423.91 (-828.00%)
System NUMA01_THEADLOCAL 40.92 ( 0.00%) 926.72 (-2164.71%) 188.15 (-359.80%)
System NUMA02 1.72 ( 0.00%) 23.64 (-1274.42%) 27.37 (-1491.28%)
System NUMA02_SMT 0.92 ( 0.00%) 8.18 (-789.13%) 18.43 (-1903.26%)
Elapsed NUMA01 1514.61 ( 0.00%) 1122.78 ( 25.87%) 722.66 ( 52.29%)
Elapsed NUMA01_THEADLOCAL 1264.08 ( 0.00%) 393.79 ( 68.85%) 391.48 ( 69.03%)
Elapsed NUMA02 181.88 ( 0.00%) 49.44 ( 72.82%) 61.55 ( 66.16%)
Elapsed NUMA02_SMT 168.41 ( 0.00%) 47.49 ( 71.80%) 54.72 ( 67.51%)
CPU NUMA01 4449.00 ( 0.00%) 4513.00 ( -1.44%) 4247.00 ( 4.54%)
CPU NUMA01_THEADLOCAL 4337.00 ( 0.00%) 4602.00 ( -6.11%) 4400.00 ( -1.45%)
CPU NUMA02 3948.00 ( 0.00%) 4287.00 ( -8.59%) 3455.00 ( 12.49%)
CPU NUMA02_SMT 1798.00 ( 0.00%) 2118.00 (-17.80%) 1957.00 ( -8.84%)
On NUMA01, I'm seeing a large gain for schednuma. The test was not run
multiple times so I do not know how much it deviates by on each run.
However, the system CPU usage was again very high.
NUMA01_THEADLOCAL figures were comparable with autonuma. The system CPU
usage was high. As before, autonumas looks low but with the kernel
threads we cannot be sure.
schednuma was a clear winner on NUMA02 and NUMA02_SMT.
So for the synthetic benchmarks, schednuma looks good in terms of
elapsed time. On specjbb though, it is not looking great and this may be
due to differences in how we configured the JVMs.
I would have some comparison data with my own stuff but unfortunately
the machine crashed when running tests with schednuma. That said, I
expect the figures to be bad if they had run. With V2, the CPU-follows
placement policy is broken as is PMD handling. In my current tree I'm
expecting the system CPU usage to be also high but I won't know for sure
until later today.
The machine was meant to test all this overnight but unfortunately when
running a kernel build benchmark on the schednuma patches the machine
hung while downloading the tarball with this
[ 73.863226] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 73.871062] IP: [<ffffffff8146feaa>] skb_gro_receive+0xaa/0x590
[ 73.876983] PGD 0
[ 73.878998] Oops: 0002 [#1] PREEMPT SMP
[ 73.882938] Modules linked in: af_packet mperf kvm_intel coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd sr_mod lrw cdrom aes_x86_64 ses pcspkr xts i7core_edac ata_piix enclosure lpc_ich dcdbas sg gf128mul mfd_core bnx2 edac_core wmi acpi_power_meter button serio_raw joydev microcode autofs4 processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh_emc scsi_dh ata_generic megaraid_sas pata_atiixp [last unloaded: oprofile]
[ 73.924659] CPU 0
[ 73.926493] Pid: 0, comm: swapper/0 Not tainted 3.7.0-rc4-schednuma-v2r3 #1 Dell Inc. PowerEdge R810/0TT6JF
[ 73.936380] RIP: 0010:[<ffffffff8146feaa>] [<ffffffff8146feaa>] skb_gro_receive+0xaa/0x590
[ 73.944714] RSP: 0018:ffff88047f803b50 EFLAGS: 00010282
[ 73.950004] RAX: 0000000000000000 RBX: ffff88046c2bdbc0 RCX: 0000000000000900
[ 73.957113] RDX: 00000000000005a8 RSI: ffff88046c2bdbc0 RDI: ffff88046eadb800
[ 73.964221] RBP: ffff88047f803bb0 R08: 00000000000005dc R09: ffff88046ddeccc0
[ 73.971328] R10: ffff88086d795d78 R11: 0000000000000001 R12: ffff880462b282c0
[ 73.978436] R13: 0000000000000034 R14: 00000000000005a8 R15: ffff88046eadbec0
[ 73.985543] FS: 0000000000000000(0000) GS:ffff88047f800000(0000) knlGS:0000000000000000
[ 73.993602] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 73.999326] CR2: 0000000000000000 CR3: 0000000001a0c000 CR4: 00000000000007f0
[ 74.006435] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 74.013543] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 74.020651] Process swapper/0 (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a14420)
[ 74.028883] Stack:
[ 74.030885] 0000000000000060 ffff880462b282c0 ffff88086d795d78 ffffffff000005dc
[ 74.038300] ffff88046e5f46c0 000000606a275ec0 0000000000000000 ffff88046c2bdbc0
[ 74.045715] 00000000000005a8 ffff88086d795d78 00000000000005a8 000000006c001080
[ 74.053131] Call Trace:
[ 74.055567] <IRQ>
[ 74.057486] [<ffffffff814b9573>] tcp_gro_receive+0x213/0x2b0
[ 74.063419] [<ffffffff814cff49>] tcp4_gro_receive+0x99/0x110
[ 74.069150] [<ffffffff814e096d>] inet_gro_receive+0x1cd/0x200
[ 74.074965] [<ffffffff8147b30a>] dev_gro_receive+0x1ba/0x2b0
[ 74.080691] [<ffffffff8147b6e3>] napi_gro_receive+0xe3/0x130
[ 74.086426] [<ffffffffa009fda8>] bnx2_rx_int+0x3e8/0xf10 [bnx2]
[ 74.092416] [<ffffffffa00a0cbd>] bnx2_poll_work+0x3ed/0x450 [bnx2]
[ 74.098666] [<ffffffffa00a0d5e>] bnx2_poll_msix+0x3e/0xc0 [bnx2]
[ 74.104739] [<ffffffff8147b969>] net_rx_action+0x159/0x290
[ 74.110298] [<ffffffff8104d148>] __do_softirq+0xc8/0x250
[ 74.115682] [<ffffffff8107bf9e>] ? sched_clock_idle_wakeup_event+0x1e/0x20
[ 74.122625] [<ffffffff81577c9c>] call_softirq+0x1c/0x30
[ 74.127922] [<ffffffff8100470d>] do_softirq+0x6d/0xa0
[ 74.133041] [<ffffffff8104d44d>] irq_exit+0xad/0xc0
[ 74.137996] [<ffffffff8107779d>] scheduler_ipi+0x5d/0x110
[ 74.143469] [<ffffffff8102b7a4>] ? native_apic_msr_eoi_write+0x14/0x20
[ 74.150060] [<ffffffff810257d5>] smp_reschedule_interrupt+0x25/0x30
[ 74.156394] [<ffffffff8157785d>] reschedule_interrupt+0x6d/0x80
[ 74.162376] <EOI>
[ 74.164295] [<ffffffff81316798>] ? intel_idle+0xe8/0x150
[ 74.169875] [<ffffffff81316779>] ? intel_idle+0xc9/0x150
[ 74.175259] [<ffffffff8143de99>] cpuidle_enter+0x19/0x20
[ 74.180642] [<ffffffff8143e522>] cpuidle_idle_call+0xa2/0x340
[ 74.186458] [<ffffffff8100baca>] cpu_idle+0x7a/0xf0
[ 74.191410] [<ffffffff8154b44b>] rest_init+0x7b/0x80
[ 74.196447] [<ffffffff81ac3be2>] start_kernel+0x38f/0x39c
[ 74.201913] [<ffffffff81ac3652>] ? repair_env_string+0x5e/0x5e
[ 74.207815] [<ffffffff81ac3335>] x86_64_start_reservations+0x131/0x135
[ 74.214407] [<ffffffff81ac3439>] x86_64_start_kernel+0x100/0x10f
[ 74.220475] Code: 8b e8 00 00 00 0f 87 86 00 00 00 8b 53 68 8b 43 6c 44 29 ea 39 d0 89 53 68 0f 87 c7 04 00 00 4c 01 ab e0 00 00 00 49 8b 44 24 08 <48> 89 18 49 89 5c 24 08 0f b6 43 7c a8 10 0f 85 ac 04 00 00 83
[ 74.240051] RIP [<ffffffff8146feaa>] skb_gro_receive+0xaa/0x590
[ 74.246046] RSP <ffff88047f803b50>
[ 74.249518] CR2: 0000000000000000
[ 74.252821] ---[ end trace 97cb529523f52c9b ]---
[ 74.258895] Kernel panic - not syncing: Fatal exception in interrupt
-- 0:console -- time-stamp -- Nov/15/12 3:09:06 --
I've no idea if it is directly related to your patches and I didn't try
to reproduce it yet.
> generation tool: 'perf bench numa' (I'll post it later in a
> separate reply).
>
> Via 'perf bench numa' we can generate arbitrary process and
> thread layouts, with arbitrary memory sharing arrangements
> between them.
>
> Here are various comparisons to the vanilla kernel (higher
> numbers are better):
>
> #
> # 4 processes with 4 threads per process, sharing 4x 1GB of
> # process-wide memory:
> #
> # perf bench numa mem -l 100 -zZ0 -p 4 -t 4 -P 1024 -T 0
> #
> v3.7-vanilla: 14.8 GB/sec
> v3.7-NUMA: 32.9 GB/sec [ +122.3% ]
>
> 2.2 times faster.
>
> #
> # 4 processes with 4 threads per process, sharing 4x 1GB of
> # process-wide memory:
> #
> # perf bench numa mem -l 100 -zZ0 -p 4 -t 4 -P 0 -T 1024
> #
>
> v3.7-vanilla: 17.0 GB/sec
> v3.7-NUMA: 36.3 GB/sec [ +113.5% ]
>
> 2.1 times faster.
>
That is really cool.
> So it's a nice improvement all around. With this version the
> regressions that Mel Gorman reported a week ago appear to be
> fixed as well.
>
Unfortunately I cannot concur. I'm still seeing high system CPU usage in
places and the specjbb figures are rather unfortunate.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists