lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100812172544.655648128@de.ibm.com>
Date:	Thu, 12 Aug 2010 19:25:44 +0200
From:	Heiko Carstens <heiko.carstens@...ibm.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Andreas Herrmann <andreas.herrmann3@....com>
Cc:	linux-kernel@...r.kernel.org,
	Martin Schwidefsky <schwidefsky@...ibm.com>
Subject: [PATCH/RFC 0/5] sched: add new 'book' scheduling domain

This patch set adds (yet) another scheduling domain to the scheduler. The
reason for this is that the recent (s390) z196 architecture has four cache
levels and uniform memory access (sort of -- see below).
The cpu/cache/memory hierarchy is as follows:

Each cpu has its private L1 (64KB I-cache + 128KB D-cache) and L2 (1.5MB)
cache.
A core consists of four cpus with a 24MB shared L3 cache.
A book consists of six cores with a 192MB shared L4 cache.

The z196 architecture has no SMT.
Also the statement that we have uniform memory access is not entirely
correct. Actually the machine uses memory striping, so it "looks" like
we have UMA until the next slice of memory gets accessed.
However there is no interface which tells us which piece of memory is local
or remote. So we (have to) simplify and assume that the cost of each memory
access with L4 cache miss is the same.

In order to somehow use the information about the cache hierarchy so that
the scheduler can make some decisions that improves cache hits I added the
'BOOK' scheduling domain between the MC and CPU domains.

First performance measurements however show now effect - neither good nor
bad. So it might be that the workloads aren't good enough, or that the
implementation is simply wrong.

Either way, since its currently very hard to get machine time for additional
measurements I thought it might be a good idea to post the patches as an RFC
even if we do not have any convincing arguments.

Also please note that the scheduling domain initializers certainly need some
tuning:
The line
#define SD_BOOK_INIT SD_CPU_INIT
within the arch support patch is just there so it compiles and until we have
something that really works.

As for the patches, I thinks that the first two patches could be merged
anytime since those are only cleanup/preparation patches.
Patch three adds the new scheduling domain and patch four the code needed
to represent books via the cpu topology sysfs interface.
Patch five is just the architecture backend.

A boot of a logical partition with 20 cpus, shared on two books, gives these
initializion output to the console:

Brought up 20 CPUs
CPU0 attaching sched-domain:
 domain 0: span 0-5 level BOOK
  groups: 0 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048)
  domain 1: span 0-19 level CPU
   groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
CPU1 attaching sched-domain:
 domain 0: span 1-3 level MC
  groups: 1 2 3
  domain 1: span 0-5 level BOOK
   groups: 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048) 0
   domain 2: span 0-19 level CPU
    groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
CPU2 attaching sched-domain:
 domain 0: span 1-3 level MC
  groups: 2 3 1
  domain 1: span 0-5 level BOOK
   groups: 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048) 0
   domain 2: span 0-19 level CPU
    groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
CPU3 attaching sched-domain:
 domain 0: span 1-3 level MC
  groups: 3 1 2
  domain 1: span 0-5 level BOOK
   groups: 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048) 0
   domain 2: span 0-19 level CPU
    groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
CPU4 attaching sched-domain:
 domain 0: span 4-5 level MC
  groups: 4 5
  domain 1: span 0-5 level BOOK
   groups: 4-5 (cpu_power = 2048) 0 1-3 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
CPU5 attaching sched-domain:
 domain 0: span 4-5 level MC
  groups: 5 4
  domain 1: span 0-5 level BOOK
   groups: 4-5 (cpu_power = 2048) 0 1-3 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
CPU6 attaching sched-domain:
 domain 0: span 6-9 level MC
  groups: 6 7 8 9
  domain 1: span 6-19 level BOOK
   groups: 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU7 attaching sched-domain:
 domain 0: span 6-9 level MC
  groups: 7 8 9 6
  domain 1: span 6-19 level BOOK
   groups: 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU8 attaching sched-domain:
 domain 0: span 6-9 level MC
  groups: 8 9 6 7
  domain 1: span 6-19 level BOOK
   groups: 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU9 attaching sched-domain:
 domain 0: span 6-9 level MC
  groups: 9 6 7 8
  domain 1: span 6-19 level BOOK
   groups: 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU10 attaching sched-domain:
 domain 0: span 10-11 level MC
  groups: 10 11
  domain 1: span 6-19 level BOOK
   groups: 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU11 attaching sched-domain:
 domain 0: span 10-11 level MC
  groups: 11 10
  domain 1: span 6-19 level BOOK
   groups: 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU12 attaching sched-domain:
 domain 0: span 12-13 level MC
  groups: 12 13
  domain 1: span 6-19 level BOOK
   groups: 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU13 attaching sched-domain:
 domain 0: span 12-13 level MC
  groups: 13 12
  domain 1: span 6-19 level BOOK
   groups: 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU14 attaching sched-domain:
 domain 0: span 14-16 level MC
  groups: 14 15 16
  domain 1: span 6-19 level BOOK
   groups: 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU15 attaching sched-domain:
 domain 0: span 14-16 level MC
  groups: 15 16 14
  domain 1: span 6-19 level BOOK
   groups: 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU16 attaching sched-domain:
 domain 0: span 14-16 level MC
  groups: 16 14 15
  domain 1: span 6-19 level BOOK
   groups: 14-16 (cpu_power = 3072) 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU17 attaching sched-domain:
 domain 0: span 17-19 level MC
  groups: 17 18 19
  domain 1: span 6-19 level BOOK
   groups: 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU18 attaching sched-domain:
 domain 0: span 17-19 level MC
  groups: 18 19 17
  domain 1: span 6-19 level BOOK
   groups: 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
CPU19 attaching sched-domain:
 domain 0: span 17-19 level MC
  groups: 19 17 18
  domain 1: span 6-19 level BOOK
   groups: 17-19 (cpu_power = 3072) 6-9 (cpu_power = 4096) 10-11 (cpu_power = 2048) 12-13 (cpu_power = 2048) 14-16 (cpu_power = 3072)
   domain 2: span 0-19 level CPU
    groups: 6-19 (cpu_power = 14336) 0-5 (cpu_power = 6144)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ