[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef4903e2-c2ce-9a64-68b0-c7ee483eb582@suse.cz>
Date: Sun, 18 Jul 2021 09:41:49 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Mike Galbraith <efault@....de>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Christoph Lameter <cl@...ux.com>,
David Rientjes <rientjes@...gle.com>,
Pekka Enberg <penberg@...nel.org>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Mel Gorman <mgorman@...hsingularity.net>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Jann Horn <jannh@...gle.com>
Subject: Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT
compatible
On 7/3/21 9:24 AM, Mike Galbraith wrote:
> On Fri, 2021-07-02 at 20:29 +0200, Sebastian Andrzej Siewior wrote:
>> I replaced my slub changes with slub-local-lock-v2r3.
>> I haven't seen any complains from lockdep or so which is good. Then I
>> did this with RT enabled (and no debug):
>
> Below is some raw hackbench data from my little i4790 desktop box. It
> says we'll definitely still want list_lock to be raw.
Hi Mike, thanks a lot for the testing, sorry for late reply.
Did you try, instead of raw list_lock, not applying the last, local lock
patch, as I suggested in reply to bigeasy? I think the impact at
reducing the RT-specific overhead would be larger (than raw list_lock),
the result should still be RT compatible, and it would also deal with
the bugs you found there... (which I'll look into).
Thanks,
Vlastimil
> It also appears to be saying that there's something RT specific to
> stare at in addition to the list_lock business, but add a pinch of salt
> to that due to the config of the virgin(ish) tip tree being much
> lighter than the enterprise(ish) config of the tip-rt tree.
>
> perf stat -r10 hackbench -s4096 -l500
> full warmup, record, repeat twice for elapsed
>
> 5.13.0.g60ab3ed-tip-rt
> 8,898.51 msec task-clock # 7.525 CPUs utilized ( +- 0.33% )
> 368,922 context-switches # 0.041 M/sec ( +- 5.20% )
> 42,281 cpu-migrations # 0.005 M/sec ( +- 5.28% )
> 13,180 page-faults # 0.001 M/sec ( +- 0.70% )
> 33,343,378,867 cycles # 3.747 GHz ( +- 0.30% )
> 21,656,783,887 instructions # 0.65 insn per cycle ( +- 0.67% )
> 4,408,569,663 branches # 495.428 M/sec ( +- 0.73% )
> 12,040,125 branch-misses # 0.27% of all branches ( +- 2.93% )
>
> 1.18260 +- 0.00473 seconds time elapsed ( +- 0.40% )
> 1.19018 +- 0.00441 seconds time elapsed ( +- 0.37% ) (repeat)
> 1.18260 +- 0.00473 seconds time elapsed ( +- 0.40% ) (repeat)
>
> 5.13.0.g60ab3ed-tip-rt +slub-local-lock-v2r3 list_lock=raw_spinlock_t
> 9,642.00 msec task-clock # 7.521 CPUs utilized ( +- 0.46% )
> 462,091 context-switches # 0.048 M/sec ( +- 4.79% )
> 44,411 cpu-migrations # 0.005 M/sec ( +- 4.34% )
> 12,980 page-faults # 0.001 M/sec ( +- 0.43% )
> 36,098,859,429 cycles # 3.744 GHz ( +- 0.44% )
> 25,462,853,462 instructions # 0.71 insn per cycle ( +- 0.50% )
> 5,260,898,360 branches # 545.623 M/sec ( +- 0.52% )
> 16,088,686 branch-misses # 0.31% of all branches ( +- 2.02% )
>
> 1.28207 +- 0.00568 seconds time elapsed ( +- 0.44% )
> 1.28744 +- 0.00713 seconds time elapsed ( +- 0.55% ) (repeat)
> 1.28085 +- 0.00850 seconds time elapsed ( +- 0.66% ) (repeat)
>
> 5.13.0.g60ab3ed-tip-rt +slub-local-lock-v2r3 list_lock=spinlock_t
> 10,004.89 msec task-clock # 6.029 CPUs utilized ( +- 1.37% )
> 654,311 context-switches # 0.065 M/sec ( +- 5.16% )
> 211,070 cpu-migrations # 0.021 M/sec ( +- 1.38% )
> 13,262 page-faults # 0.001 M/sec ( +- 0.79% )
> 36,585,914,931 cycles # 3.657 GHz ( +- 1.35% )
> 27,682,240,511 instructions # 0.76 insn per cycle ( +- 1.06% )
> 5,766,064,432 branches # 576.325 M/sec ( +- 1.11% )
> 24,269,069 branch-misses # 0.42% of all branches ( +- 2.03% )
>
> 1.6595 +- 0.0116 seconds time elapsed ( +- 0.70% )
> 1.6270 +- 0.0180 seconds time elapsed ( +- 1.11% ) (repeat)
> 1.6213 +- 0.0150 seconds time elapsed ( +- 0.93% ) (repeat)
>
> virgin(ish) tip
> 5.13.0.g60ab3ed-tip
> 7,320.67 msec task-clock # 7.792 CPUs utilized ( +- 0.31% )
> 221,215 context-switches # 0.030 M/sec ( +- 3.97% )
> 16,234 cpu-migrations # 0.002 M/sec ( +- 4.07% )
> 13,233 page-faults # 0.002 M/sec ( +- 0.91% )
> 27,592,205,252 cycles # 3.769 GHz ( +- 0.32% )
> 8,309,495,040 instructions # 0.30 insn per cycle ( +- 0.37% )
> 1,555,210,607 branches # 212.441 M/sec ( +- 0.42% )
> 5,484,209 branch-misses # 0.35% of all branches ( +- 2.13% )
>
> 0.93949 +- 0.00423 seconds time elapsed ( +- 0.45% )
> 0.94608 +- 0.00384 seconds time elapsed ( +- 0.41% ) (repeat)
> 0.94422 +- 0.00410 seconds time elapsed ( +- 0.43% )
>
> 5.13.0.g60ab3ed-tip +slub-local-lock-v2r3
> 7,343.57 msec task-clock # 7.776 CPUs utilized ( +- 0.44% )
> 223,044 context-switches # 0.030 M/sec ( +- 3.02% )
> 16,057 cpu-migrations # 0.002 M/sec ( +- 4.03% )
> 13,164 page-faults # 0.002 M/sec ( +- 0.97% )
> 27,684,906,017 cycles # 3.770 GHz ( +- 0.45% )
> 8,323,273,871 instructions # 0.30 insn per cycle ( +- 0.28% )
> 1,556,106,680 branches # 211.901 M/sec ( +- 0.31% )
> 5,463,468 branch-misses # 0.35% of all branches ( +- 1.33% )
>
> 0.94440 +- 0.00352 seconds time elapsed ( +- 0.37% )
> 0.94830 +- 0.00228 seconds time elapsed ( +- 0.24% ) (repeat)
> 0.93813 +- 0.00440 seconds time elapsed ( +- 0.47% ) (repeat)
>
Powered by blists - more mailing lists