lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 2 Jul 2021 20:29:44 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Christoph Lameter <cl@...ux.com>,
        David Rientjes <rientjes@...gle.com>,
        Pekka Enberg <penberg@...nel.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Jann Horn <jannh@...gle.com>
Subject: Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT
 compatible

I replaced my slub changes with slub-local-lock-v2r3.
I haven't seen any complains from lockdep or so which is good. Then I
did this with RT enabled (and no debug):

- A "time make -j32" run of allmodconfig on /dev/shm.
  Old:
| real    20m6,217s
| user    568m22,553s
| sys     48m33,126s

  New:
| real    20m9,049s
| user    569m32,096s
| sys     48m47,670s

  These 3 seconds here are probably in the noise range.

- perf_5.10 stat -r 10 hackbench -g200 -s 4096 -l500
Old:
|         464.967,20 msec task-clock                #   27,220 CPUs utilized            ( +-  0,16% )
|          7.683.944      context-switches          #    0,017 M/sec                    ( +-  0,86% )
|            931.380      cpu-migrations            #    0,002 M/sec                    ( +-  4,94% )
|            219.569      page-faults               #    0,472 K/sec                    ( +-  0,39% )
|  1.104.727.599.918      cycles                    #    2,376 GHz                      ( +-  0,18% )
|    941.428.898.087      stalled-cycles-frontend   #   85,22% frontend cycles idle     ( +-  0,24% )
|    729.016.546.572      stalled-cycles-backend    #   65,99% backend cycles idle      ( +-  0,32% )
|    340.133.571.519      instructions              #    0,31  insn per cycle
|                                                   #    2,77  stalled cycles per insn  ( +-  0,12% )
|     73.746.821.314      branches                  #  158,607 M/sec                    ( +-  0,13% )
|        377.838.006      branch-misses             #    0,51% of all branches          ( +-  1,01% )
| 
|            17,0820 +- 0,0202 seconds time elapsed  ( +-  0,12% )

New:
|         422.865,71 msec task-clock                #    4,782 CPUs utilized            ( +-  0,34% )
|         14.594.238      context-switches          #    0,035 M/sec                    ( +-  0,43% )
|          3.737.926      cpu-migrations            #    0,009 M/sec                    ( +-  0,46% )
|            218.474      page-faults               #    0,517 K/sec                    ( +-  0,74% )
|    940.715.812.020      cycles                    #    2,225 GHz                      ( +-  0,34% )
|    716.593.827.820      stalled-cycles-frontend   #   76,18% frontend cycles idle     ( +-  0,39% )
|    550.730.862.839      stalled-cycles-backend    #   58,54% backend cycles idle      ( +-  0,43% )
|    417.274.588.907      instructions              #    0,44  insn per cycle
|                                                   #    1,72  stalled cycles per insn  ( +-  0,17% )
|     92.814.150.290      branches                  #  219,488 M/sec                    ( +-  0,17% )
|        822.102.170      branch-misses             #    0,89% of all branches          ( +-  0,41% )
| 
|             88,427 +- 0,618 seconds time elapsed  ( +-  0,70% )

So this is outside of the noise range.
I'm not sure where this is coming from. My guess would be higher lock
contention within the memory allocator.
  
> The remaining patches to upstream from the RT tree are small ones related to
> KConfig. The patch that restricts PREEMPT_RT to SLUB (not SLAB or SLOB) makes
> sense. The patch that disables CONFIG_SLUB_CPU_PARTIAL with PREEMPT_RT could
> perhaps be re-evaluated as the series also addresses some latency issues with
> percpu partial slabs.

With that series the PARTIAL slab can be indeed enabled. I have (had) a
half done series where I had PARTIAL enabled and noticed a slight
increase in latency so made it "default y on !RT". It wasn't dramatic
but appeared to be outside of noise.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ