[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com>
Date: Tue, 15 Sep 2020 12:58:07 -0400
From: Daniel Jordan <daniel.m.jordan@...cle.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Alex Shi <alex.shi@...ux.alibaba.com>,
Andrew Morton <akpm@...ux-foundation.org>,
mgorman@...hsingularity.net, tj@...nel.org,
khlebnikov@...dex-team.ru, daniel.m.jordan@...cle.com,
willy@...radead.org, hannes@...xchg.org, lkp@...el.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, shakeelb@...gle.com,
iamjoonsoo.kim@....com, richard.weiyang@...il.com,
kirill@...temov.name, alexander.duyck@...il.com,
rong.a.chen@...el.com, mhocko@...e.com, vdavydov.dev@...il.com,
shy828301@...il.com, vbabka@...e.cz, minchan@...nel.org, cai@....pw
Subject: Re: [PATCH v18 00/32] per memcg lru_lock: reviews
On Tue, Sep 15, 2020 at 01:21:56AM -0700, Hugh Dickins wrote:
> On Sun, 13 Sep 2020, Alex Shi wrote:
> > Uh, I updated the testing with some new results here:
> > https://lkml.org/lkml/2020/8/26/212
>
> Right, I missed that, that's better, thanks. Any other test results?
Alex, you were doing some will-it-scale runs earlier. Are you planning to do
more of those? Otherwise I can add them in.
This is what I have so far.
sysbench oltp read-only
-----------------------
The goal was to run a real world benchmark, at least more so than something
like vm-scalability, with the memory controller enabled but unused to check for
regressions.
I chose sysbench because it was relatively straightforward to run, but I'm open
to ideas for other high level benchmarks that might be more sensitive to this
series.
CoeffVar shows the test was pretty noisy overall. It's nice to see there's no
significant difference between the kernels for low thread counts (1-12), but
I'm not sure what to make of the 18 and 20 thread cases. At 20 threads, the
CPUs of the node that the test was confined to were saturated and the variance
is especially high. I'm tempted to write the 18 and 20 thread cases off as
noise.
- 2-socket * 10-core * 2-hyperthread broadwell server
- test bound to node 1 to lower variance
- 251G memory, divided evenly between the nodes (memory size of test shrunk to
accommodate confining to one node)
- 12 iterations per thread count per kernel
- THP enabled
export OLTP_CACHESIZE=$(($MEMTOTAL_BYTES/4))
export OLTP_SHAREDBUFFERS=$((MEMTOTAL_BYTES/8))
export OLTP_PAGESIZES="default"
export SYSBENCH_DRIVER=postgres
export SYSBENCH_MAX_TRANSACTIONS=auto
export SYSBENCH_READONLY=yes
export SYSBENCH_MAX_THREADS=$((NUMCPUS / 2))
export SYSBENCH_ITERATIONS=12
export SYSBENCH_WORKLOAD_SIZE=$((MEMTOTAL_BYTES*3/8))
export SYSBENCH_CACHE_COLD=no
export DATABASE_INIT_ONCE=yes
export MMTESTS_NUMA_POLICY=fullbind_single_instance_node
numactl --cpunodebind=1 --membind=1 <mmtests_cmdline>
sysbench Transactions per second
5.9-rc2 5.9-rc2-lru-v18
Min 1 593.23 ( 0.00%) 583.37 ( -1.66%)
Min 4 1897.34 ( 0.00%) 1871.77 ( -1.35%)
Min 7 2471.14 ( 0.00%) 2449.77 ( -0.86%)
Min 12 2680.00 ( 0.00%) 2853.25 ( 6.46%)
Min 18 2183.82 ( 0.00%) 1191.43 ( -45.44%)
Min 20 924.96 ( 0.00%) 526.66 ( -43.06%)
Hmean 1 912.08 ( 0.00%) 904.24 ( -0.86%)
Hmean 4 2057.11 ( 0.00%) 2044.69 ( -0.60%)
Hmean 7 2817.59 ( 0.00%) 2812.80 ( -0.17%)
Hmean 12 3201.05 ( 0.00%) 3171.09 ( -0.94%)
Hmean 18 2529.10 ( 0.00%) 2009.99 * -20.53%*
Hmean 20 1742.29 ( 0.00%) 1127.77 * -35.27%*
Stddev 1 219.21 ( 0.00%) 220.92 ( -0.78%)
Stddev 4 94.94 ( 0.00%) 84.34 ( 11.17%)
Stddev 7 189.42 ( 0.00%) 167.58 ( 11.53%)
Stddev 12 372.13 ( 0.00%) 199.40 ( 46.42%)
Stddev 18 248.42 ( 0.00%) 574.66 (-131.32%)
Stddev 20 757.69 ( 0.00%) 666.87 ( 11.99%)
CoeffVar 1 22.54 ( 0.00%) 22.86 ( -1.42%)
CoeffVar 4 4.61 ( 0.00%) 4.12 ( 10.60%)
CoeffVar 7 6.69 ( 0.00%) 5.94 ( 11.30%)
CoeffVar 12 11.49 ( 0.00%) 6.27 ( 45.46%)
CoeffVar 18 9.74 ( 0.00%) 26.22 (-169.23%)
CoeffVar 20 36.32 ( 0.00%) 47.18 ( -29.89%)
Max 1 1117.45 ( 0.00%) 1107.33 ( -0.91%)
Max 4 2184.92 ( 0.00%) 2136.65 ( -2.21%)
Max 7 3086.81 ( 0.00%) 3049.52 ( -1.21%)
Max 12 4020.07 ( 0.00%) 3580.95 ( -10.92%)
Max 18 3032.30 ( 0.00%) 2810.85 ( -7.30%)
Max 20 2891.27 ( 0.00%) 2675.80 ( -7.45%)
BHmean-50 1 1098.77 ( 0.00%) 1093.58 ( -0.47%)
BHmean-50 4 2139.76 ( 0.00%) 2107.13 ( -1.52%)
BHmean-50 7 2972.18 ( 0.00%) 2953.94 ( -0.61%)
BHmean-50 12 3494.73 ( 0.00%) 3311.33 ( -5.25%)
BHmean-50 18 2729.70 ( 0.00%) 2606.32 ( -4.52%)
BHmean-50 20 2668.72 ( 0.00%) 1779.87 ( -33.31%)
BHmean-95 1 958.94 ( 0.00%) 951.84 ( -0.74%)
BHmean-95 4 2072.98 ( 0.00%) 2062.01 ( -0.53%)
BHmean-95 7 2853.96 ( 0.00%) 2851.21 ( -0.10%)
BHmean-95 12 3258.65 ( 0.00%) 3203.53 ( -1.69%)
BHmean-95 18 2565.99 ( 0.00%) 2143.90 ( -16.45%)
BHmean-95 20 1894.47 ( 0.00%) 1258.34 ( -33.58%)
BHmean-99 1 958.94 ( 0.00%) 951.84 ( -0.74%)
BHmean-99 4 2072.98 ( 0.00%) 2062.01 ( -0.53%)
BHmean-99 7 2853.96 ( 0.00%) 2851.21 ( -0.10%)
BHmean-99 12 3258.65 ( 0.00%) 3203.53 ( -1.69%)
BHmean-99 18 2565.99 ( 0.00%) 2143.90 ( -16.45%)
BHmean-99 20 1894.47 ( 0.00%) 1258.34 ( -33.58%)
sysbench Time
5.9-rc2 5.9-rc2-lru
Min 1 8.96 ( 0.00%) 9.04 ( -0.89%)
Min 4 4.63 ( 0.00%) 4.74 ( -2.38%)
Min 7 3.34 ( 0.00%) 3.38 ( -1.20%)
Min 12 2.65 ( 0.00%) 2.95 ( -11.32%)
Min 18 3.54 ( 0.00%) 3.80 ( -7.34%)
Min 20 3.74 ( 0.00%) 4.02 ( -7.49%)
Amean 1 11.00 ( 0.00%) 11.11 ( -0.98%)
Amean 4 4.92 ( 0.00%) 4.95 ( -0.59%)
Amean 7 3.65 ( 0.00%) 3.65 ( -0.16%)
Amean 12 3.29 ( 0.00%) 3.32 ( -0.89%)
Amean 18 4.20 ( 0.00%) 5.22 * -24.39%*
Amean 20 6.02 ( 0.00%) 9.14 * -51.98%*
Stddev 1 3.33 ( 0.00%) 3.45 ( -3.40%)
Stddev 4 0.23 ( 0.00%) 0.21 ( 7.89%)
Stddev 7 0.25 ( 0.00%) 0.22 ( 9.87%)
Stddev 12 0.35 ( 0.00%) 0.19 ( 45.09%)
Stddev 18 0.38 ( 0.00%) 1.75 (-354.74%)
Stddev 20 2.93 ( 0.00%) 4.73 ( -61.72%)
CoeffVar 1 30.30 ( 0.00%) 31.02 ( -2.40%)
CoeffVar 4 4.63 ( 0.00%) 4.24 ( 8.43%)
CoeffVar 7 6.77 ( 0.00%) 6.10 ( 10.02%)
CoeffVar 12 10.74 ( 0.00%) 5.85 ( 45.57%)
CoeffVar 18 9.15 ( 0.00%) 33.45 (-265.58%)
CoeffVar 20 48.64 ( 0.00%) 51.75 ( -6.41%)
Max 1 17.01 ( 0.00%) 17.36 ( -2.06%)
Max 4 5.33 ( 0.00%) 5.40 ( -1.31%)
Max 7 4.14 ( 0.00%) 4.18 ( -0.97%)
Max 12 3.89 ( 0.00%) 3.67 ( 5.66%)
Max 18 4.82 ( 0.00%) 8.64 ( -79.25%)
Max 20 11.09 ( 0.00%) 19.26 ( -73.67%)
BAmean-50 1 9.12 ( 0.00%) 9.16 ( -0.49%)
BAmean-50 4 4.73 ( 0.00%) 4.80 ( -1.55%)
BAmean-50 7 3.46 ( 0.00%) 3.48 ( -0.58%)
BAmean-50 12 3.02 ( 0.00%) 3.18 ( -5.24%)
BAmean-50 18 3.90 ( 0.00%) 4.08 ( -4.52%)
BAmean-50 20 4.02 ( 0.00%) 5.90 ( -46.56%)
BAmean-95 1 10.45 ( 0.00%) 10.54 ( -0.82%)
BAmean-95 4 4.88 ( 0.00%) 4.91 ( -0.52%)
BAmean-95 7 3.60 ( 0.00%) 3.60 ( -0.08%)
BAmean-95 12 3.23 ( 0.00%) 3.28 ( -1.60%)
BAmean-95 18 4.14 ( 0.00%) 4.91 ( -18.58%)
BAmean-95 20 5.56 ( 0.00%) 8.22 ( -48.04%)
BAmean-99 1 10.45 ( 0.00%) 10.54 ( -0.82%)
BAmean-99 4 4.88 ( 0.00%) 4.91 ( -0.52%)
BAmean-99 7 3.60 ( 0.00%) 3.60 ( -0.08%)
BAmean-99 12 3.23 ( 0.00%) 3.28 ( -1.60%)
BAmean-99 18 4.14 ( 0.00%) 4.91 ( -18.58%)
BAmean-99 20 5.56 ( 0.00%) 8.22 ( -48.04%)
docker-ized readtwice microbenchmark
------------------------------------
This is Alex's modified readtwice case. Needed a few fixes, and I made it into
a script. Updated version attached.
Same machine, three runs per kernel, 40 containers per test. This is average
MB/s over all containers.
5.9-rc2 5.9-rc2-lru
----------- -----------
220.5 (3.3) 356.9 (0.5)
That's a 62% improvement.
View attachment "Dockerfile" of type "text/plain" (509 bytes)
View attachment "run.sh" of type "text/plain" (1858 bytes)
View attachment "readtwice.patch" of type "text/plain" (1877 bytes)
Powered by blists - more mailing lists