linux-kernel - [PATCH RFC] sched: boost throttled entities on wakeups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <206EF0C3-1F5F-4B58-B7DA-E63298939DFD@parallels.com>
Date:	Thu, 18 Oct 2012 11:32:45 +0400
From:	Vladimir Davydov <VDavydov@...allels.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...hat.com>, Paul Turner <pjt@...gle.com>
CC:	Kirill Korotaev <dev@...allels.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"devel@...nvz.org" <devel@...nvz.org>
Subject: [PATCH RFC] sched: boost throttled entities on wakeups

If several tasks in different cpu cgroups are contending for the same resource
(e.g. a semaphore) and one of those task groups is cpu limited (using cfs
bandwidth control), the priority inversion problem is likely to arise: if a cpu
limited task goes to sleep holding the resource (e.g. trying to take another
semaphore), it can be throttled (i.e.  removed from the runqueue), which will
result in other, perhaps high-priority, tasks waiting until the low-priority
task continues its execution.

The patch tries to solve this problem by boosting tasks in throttled groups on
wakeups, i.e.  temporarily unthrottling the groups a woken task belongs to in
order to let the task finish its execution in kernel space.  This obviously
should eliminate the priority inversion problem on voluntary preemptable
kernels.  However, it does not solve the problem for fully preemptable kernels,
although I guess the patch can be extended to handle those kernels too (e.g. by
boosting forcibly preempted tasks thus not allowing to throttle).

I wrote a simple test that demonstrates the problem (the test is attached). It
creates two cgroups each of which is bound to exactly one cpu using cpusets,
sets the limit of the first group to 10% and leaves the second group unlimited.
Then in both groups it starts processes reading the same (big enough) file
along with a couple of busyloops in the limited groups, and measures the read
time.

I've run the test 10 times for a 1 Gb file on a server with > 10 Gb of RAM and
4 cores x 2 hyperthreads (the kernel was with CONFIG_PREEMPT_VOLUNTARY=y). Here
are the results:

without the patch	40.03 +- 7.04 s
with the patch		 8.42 +- 0.48 s

(Since the server's RAM can accommodate the whole file, the read time was the
same for both groups)

I would appreciate if you could answer the following questions regarding the
priority inversion problem and the proposed approach:

1) Do you agree that the problem exists and should be sorted out?

2) If so, does the general approach proposed (unthrottling on wakeups) suits
you? Why or why not?

3) If you think that the approach proposed is sane, what you dislike about the
patch?

Thank you!

---
 include/linux/sched.h   |    8 ++
 kernel/sched/core.c     |    8 ++
 kernel/sched/fair.c     |  182 ++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/features.h |    2 +
 kernel/sched/sched.h    |    6 ++
 5 files changed, 204 insertions(+), 2 deletions(-)

Download attachment "sched-boost-throttled-entities-on-wakeups.patch" of type "application/octet-stream" (10765 bytes)

Download attachment "ioprio_inv_test.sh" of type "application/octet-stream" (1795 bytes)