linux-kernel - Re: [PATCH 1/2] Customize sched domain via cpuset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1207050491.8514.710.camel@twins>
Date:	Tue, 01 Apr 2008 13:48:11 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Paul Jackson <pj@....com>
Subject: Re: [PATCH 1/2] Customize sched domain via cpuset

Adding CCs (highly recommended to CC at least the subsystem maintainers
of the stuff you touch :-)

On Tue, 2008-04-01 at 20:26 +0900, Hidetoshi Seto wrote:
> Hi all,
> 
> Using cpuset, now we can partition the system into multiple sched domains.
> Then, how about providing different characteristics for each domains?
> 
> This patch introduces new feature of cpuset - sched domain customization.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
> 
> ---
>  Documentation/cpusets.txt |   89 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 87 insertions(+), 2 deletions(-)
> 
> Index: GIT-torvalds/Documentation/cpusets.txt
> ===================================================================
> --- GIT-torvalds.orig/Documentation/cpusets.txt
> +++ GIT-torvalds/Documentation/cpusets.txt
> @@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon
>  Modified by Paul Jackson <pj@....com>
>  Modified by Christoph Lameter <clameter@....com>
>  Modified by Paul Menage <menage@...gle.com>
> +Modified by Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
> 
>  CONTENTS:
>  =========
> @@ -20,7 +21,8 @@ CONTENTS:
>    1.5 What is memory_pressure ?
>    1.6 What is memory spread ?
>    1.7 What is sched_load_balance ?
> -  1.8 How do I use cpusets ?
> +  1.8 What are other sched_* files ?
> +  1.9 How do I use cpusets ?
>  2. Usage Examples and Syntax
>    2.1 Basic Usage
>    2.2 Adding/removing cpus
> @@ -497,7 +499,90 @@ the cpuset code to update these sched do
>  partition requested with the current, and updates its sched domains,
>  removing the old and adding the new, for each change.
> 
> -1.8 How do I use cpusets ?
> +1.8 What are other sched_* files ?
> +----------------------------------
> +
> +As described in 1.7, cpuset allows you to partition the systems CPUs
> +into a number of sched domains.  Each sched domain is load balanced
> +independently, in a traditional way that designed to be good for
> +usual systems.
> +
> +But you may want to customize the behavior of load balancing for your
> +special system.  For this requirement, cpuset provides some files named
> +sched_* to customize the sched domain of the cpuset for some special
> +situation, i.e. some specific application on some special system.
> +
> +These files are per-cpuset and affect the sched domain where the
> +cpuset belongs to.  If multiple cpusets are overlapping and hence they
> +form a single sched domain, changes in one of them affect others.
> +If flag "sched_load_balance" of a cpuset is disabled, sched_* files
> +have no effect since there is no sched domain belonging the cpuset.
> +
> +Note that modifying sched_* files will have both good and bad effects,
> +and whether it is acceptable or not will be depend on your situation.
> +Don't modify these files if you are not sure the effect.
> +
> +1.8.1 What is sched_wake_idle_far ?
> +-----------------------------------
> +
> +When a task is woken up, scheduler try to wake up the task on idle CPU.
> +
> +For example, if a task A running on CPU X activates another task B
> +on the same CPU X, and if CPU Y is X's sibling and performing idle,
> +then scheduler migrate task B to CPU Y so that task B can start
> +on CPU Y without waiting task A on CPU X.
> +
> +However scheduler doesn't search whole system, just searches nearby
> +siblings at default.  Assume CPU Z is relatively far from CPU X.
> +Even if CPU Z is idle while CPU X and the siblings are busy, scheduler
> +can't migrate woken task B from X to Z.  As the result, task B on CPU X
> +need to wait task A or wait load balance on the next tick.  For some
> +special applications, waiting 1 tick is too long.
> +
> +The main reason why scheduler limits the range of searching idle CPU
> +so small such as "siblings in the socket" is because it saves
> +searching cost and migration cost.  Nowadays there are shared
> +resources between siblings - CPU caches and so on, so this limit can
> +save some migration cost assuming that the resources contain enough
> +not-expired stuff for migrating task.  Usually this assumption will
> +work, but not guaranteed.
> +
> +When the flag 'sched_wake_idle_far' is enabled, this searching range
> +is expanded to all CPUs in the sched domain of the cpuset.
> +
> +If this flag was enabled on the example of CPU Z given above,
> +scheduler can find CPU Z by taking some extra searching cost, and
> +migrate task B to CPU Z by taking some extra migration cost.
> +In exchange of these costs, you can start task B relatively fast.
> +
> +If your situation is:
> + - The migration costs between each cpu can be assumed considerably
> +   small(for you) due to your special application's behavior or
> +   special hardware support for CPU cache etc.
> + - The searching cost doesn't have impact(for you) or you can make
> +   the searching cost enough small by managing cpuset to compact etc.
> + - The latency is required even it sacrifices cache hit rate etc.
> +then turning on 'sched_wake_idle_far' would benefit you.
> +
> +1.8.2 What is sched_balance_newidle_far ?
> +-----------------------------------------
> +
> +If a CPU run out of tasks in its runqueue, the CPU try to pull extra
> +tasks from other busy CPUs to help them before it is going to be idle.
> +
> +Of course it takes some searching cost to find movable tasks,
> +scheduler might not search all CPUs in the system.  For example,
> +the range is limited in the same socket or node where the CPU locates.
> +
> +When the flag 'sched_balance_newidle_far' is enabled, this range
> +is expanded to all CPUs in the sched domain of the cpuset.
> +
> +The assumed situation where this flag is considerable is almost same
> +as that of 'sched_wake_idle_far'.  If you would like to trade better
> +latency and high operating ratio in return of some other benefits,
> +then enable this flag.
> +
> +1.9 How do I use cpusets ?
>  --------------------------
> 
>  In order to minimize the impact of cpusets on critical kernel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/