linux-kernel - Re: cpusets not cpu hotplug aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20060821140148.435d15f3.pj@sgi.com>
Date:	Mon, 21 Aug 2006 14:01:48 -0700
From:	Paul Jackson <pj@....com>
To:	Anton Blanchard <anton@...ba.org>
Cc:	simon.derr@...l.net, nathanl@...tin.ibm.com,
	linux-kernel@...r.kernel.org
Subject: Re: cpusets not cpu hotplug aware

Anton wrote:

> Maybe the notifier is the right way to go, but it seems strange to
> create two copies of cpu_online_map (with the associated possibiliy of
> the two getting out of sync).

Every cpuset in the system, of which this top_cpuset is just the first,
has a set of cpus and memory nodes on which tasks in that cpuset are
allowed to operate.  It's not just top_cpuset that we need to understand
how to relate to hotplug and unplug.

I'll bet there is more hidden state in mm/mempolicy.c, for mbind()
and set_mempolicy(), and in kernel/sched.c for the sched_setaffinity(),
which was derived from what memory nodes or cpus were online.  For
example, I see several fields in 'struct mempolicy' that contain
node numbers in some form, and the 'cpus_allowed' field in the task
struct that sched_setaffinity sets.

How does hotplug and unplug interact with these various saved states?

> Its up to the cpusets code to register a hotplug notifier to update the
> top_cpuset maps.

That, or user level code, when it adds or removes a cpu or a memory
node, needs to be responsible for adding or removing that cpu or node
to or from whichever cpusets are affected.

For example, if you just added cpu 31, to a system that had been
running on cpus 0 to 30, you can add cpu 31 to the top cpuset by
doing:

	mkdir /dev/cpuset		 	# if not already done
	mount -t cpuset cpuset /dev/cpuset	# if not already done
	/bin/echo 0-31 > /dev/cpsuet/cpus

> If cpuset_cpus_allowed doesnt return the current online mask and we want
> to schedule on a cpu that has been added since boot it looks like we
> will fail.

In general, on systems actually using cpusets, that -is- what should
happen.  Just because a cpu was brought online doesn't mean it was
intended to be allowed in any given tasks current cpuset.

Granted, I would guess users of systems not using cpusets (but
still have cpusets configured in their kernel, as is common in some
distro kernels), would expect the behaviour you expected - bringing
a cpu (or memory node) on or offline would make it available (or
not) for something like a sched_setaffinity (or mbind/set_mempolicy)
immediately, without having to invoke some magic cpuset voodoo.

Offhand, this sounds to me like a choice of two modes of operation.

    If you aren't actually using cpusets (so every task is in the
    original top_cpuset) then you'd expect that cpuset to "get out
    of the way", meaning top_cpuset (the only cpuset, in this case)
    tracked whatever cpus and nodes were online at the moment.

    If instead you start messing with cpusets (creating more than one
    of them and moving tasks between them) then you'd expect cpusets
    to be enforced, without automatically adding newly added cpus or
    memory nodes to existing cpusets.  Only the user knows which
    cpusets should get the added cpus or memory nodes in this case.

I don't jump for joy over yet another modal state flag.  But I don't see
a better alternative -- do you?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@....com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/