linux-kernel - [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20101016044349.830426011@google.com>
Date:	Fri, 15 Oct 2010 21:43:49 -0700
From:	pjt@...gle.com
To:	linux-kernel@...r.kernel.org
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Chris Friesen <cfriesen@...tel.com>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Pierre Bourdon <pbourdon@...ellency.fr>,
	Paul Turner <pjt@...gle.com>,
	Bharata B Rao <bharata@...ux.vnet.ibm.com>
Subject: [RFC tg_shares_up improvements - v1 00/12] [RFC tg_shares_up - v1 00/12] Reducing cost of tg->shares distribution

Hi all,

Peter previously posted a patchset that attempted to improve the problem of
task_group share distribution.  This is something that has been a long-time
pain point for group scheduling.  The existing algorithm considers
distributions on a per-cpu-per-domain basis and carries a fairly high update
overhead, especially on larger machines.

I was previously looking at improving this using Fenwick trees to allow a
single sum without the exorbitant cost but then Peter's idea above was better :).

The kernel is that by monitoring the average contribution to load on a
per-cpu-per-taskgroup basis we can distribute the weight for which we are
expected to consume.

This set extends the original posting with a focus on increased fairness and
reduced convergence (to true average) time.  In particular the case of large
over-commit in the case of a distributed wake-up is a concern which is now
fairly well addressed.

Obviously everything's experimental but it should be stable/fair.

Some motivation:

24 thread intel box, 150 active cgroups, multiple threads/group, load at ~90% (10 second sample):
tip:
     2.64%  [k] tg_shares_up <!>
     0.15%  [k] __set_se_shares

patched:
     0.02%  [k] update_cfs_load
     0.01%  [k] update_cpu_load
     0.00%  [k] update_cfs_shares

Some fairness coverage for the above at: http://rs5.risingnet.net/~pjt/patches/shares_data_v1.txt

Note: The last patch is fairly obviously a temporary debug patch, I only
include it as it interfaces with some analysis scripts I'm simultaneously
trying to publish for the purposes of validating this series.  Since this
approach estimates the share distribution, the spread between issued shares
and target is an important factor until people are happy with the patchset

Paul

TODO:
- Validate any RT interaction
- Continue collecting/analyzing performance and fairness data
- Should the shares period just be the sched_latency?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/