lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20061019003316.f6a77b34.pj@sgi.com>
Date:	Thu, 19 Oct 2006 00:33:16 -0700
From:	Paul Jackson <pj@....com>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	holt@....com, suresh.b.siddha@...el.com, dino@...ibm.com,
	menage@...gle.com, Simon.Derr@...l.net,
	linux-kernel@...r.kernel.org, mbligh@...gle.com,
	rohitseth@...gle.com, dipankar@...ibm.com
Subject: Re: exclusive cpusets broken with cpu hotplug

> So that depends on what cpusets asks for. If, when setting up a and
> b, it asks to partition the domains, then yes that breaks the parent
> cpuset gets broken.

That probably makes good sense from the sched domain side of things.

It is insanely counterintuitive from the cpuset side of things.

Using heirarchical cpuset properties to drive this is the wrong
approach.

In the general case, looking at it (as best I can) from the sched
domain side of things, it seems that the sched domain could be
defined on a system as follows.

    Partition the CPUs on the system - into one or more subsets
    (partitions), non-overlapping, and covering.

    Each of those partitions can either have a sched domain setup on
    it, to support load balancing across the CPUs in that partition,
    or can be isolated, with no load balancing occuring within that
    partition.

    No load balancing occurs across partitions.

Using cpu_exclusive cpusets for this is next to impossible.  It could
be approximated perhaps by having just the immediate children of the
root cpuset, /dev/cpuset/*, define the partition.

But if any lower level cpusets have any affect on the partitioning,
by setting their cpu_exclusive flag in the current implementation,
it is -always- the case, by the basic structure of the cpuset
hierarchy, that the lower level cpuset is a subset of its parents
cpus, and that that parent also has cpu_exclusive set.

The resulting partitioning, even in such simple examples as above, is
not obvious.  If you look back a couple days, when I first presented
essentially this example, I got the resulting sched domain partitioning
entirely wrong.

The essential detail in my big patch of yesterday, to add new specific
sched_domain flags to cpusets, is that it -removed- the requirement to
mark a parent as defining a sched domain anytime a child defined one.

That requirement is one of the defining properties of the cpu_exclusive
flag, and makes that flag -outrageously- unsuited for defining sched
domain partitions.

My new sched_domain flags at least had the right properties, defaults
and rules, that they perhaps could have been used to sanely define sched
domain partitions.  One could mark a few select cpusets, at any depth
in the hierarchy, as defining sched domain partions, without being
forced to mark a whole bunch more ancestor cpusets the same way,
slicing and dicing the sched domain partions into hamburger.

However, fortunately, ... so far as I can tell ... no one needs the
general case described above, of multiple sched domain partitions.

So far as I know, the only essential special case that user land
requires to deal with is to isolate one partition (one subset of CPUs)
from any scheduler load balancing.  Every CPU is either load
balanced however the kernel/sched.c chooses to load balance it,
with potentially every other non-isolated CPU, or is in the isolated
partition (cpu_isolated_map) and not considered for load balancing.

Have I missed any case requiring explicit user intervention?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@....com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ