linux-kernel - Re: [PATCH v2] Pre-emption control for userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8761n25ckg.fsf@x220.int.ebiederm.org>
Date:	Tue, 25 Mar 2014 11:59:43 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Khalid Aziz <khalid.aziz@...cle.com>
Cc:	tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
	peterz@...radead.org, akpm@...ux-foundation.org,
	andi.kleen@...el.com, rob@...dley.net, viro@...iv.linux.org.uk,
	oleg@...hat.com, gnomes@...rguk.ukuu.org.uk, riel@...hat.com,
	snorcht@...il.com, dhowells@...hat.com, luto@...capital.net,
	daeseok.youn@...il.com, linux-kernel@...r.kernel.org,
	linux-doc@...r.kernel.org
Subject: Re: [PATCH v2] Pre-emption control for userspace

Khalid Aziz <khalid.aziz@...cle.com> writes:

> This patch adds a way for a thread to request additional timeslice from
> the scheduler if it is about to be preempted, so it could complete any
> critical task it is in the middle of. This functionality helps with
> performance on databases and has been used for many years on other OSs
> by the databases. This functionality helps in situation where a thread
> acquires a lock before performing a critical operation on the database,
> happens to get preempted before it completes its task and releases the
> lock.  This lock causes all other threads that also acquire the same
> lock to perform their critical operation on the database to start
> queueing up and causing large number of context switches. This queueing
> problem can be avoided if the thread that acquires lock first could
> request scheduler to grant it an additional timeslice once it enters its
> critical section and hence allow it to complete its critical sectiona
> without causing queueing problem. If critical section completes before
> the thread is due for preemption, the thread can simply desassert its
> request. A thread sends the scheduler this request by setting a flag in
> a memory location it has shared with the kernel.  Kernel uses bytes in
> the same memory location to let the thread know when its request for
> amnesty from preemption has been granted. Thread should yield the
> processor at the end of its critical section if it was granted amnesty
> to play nice with other threads. If thread fails to yield processor, it
> gets penalized by having its next amnesty request turned down by
> scheduler.  Documentation file included in this patch contains further
> details on how to use this functionality and conditions associated with
> its use. This patch also adds a new field in scheduler statistics which
> keeps track of how many times was a thread granted amnesty from
> preemption. This feature and its usage are documented in
> Documentation/scheduler/sched-preempt-delay.txt and this patch includes
> a test for this feature under tools/testing/selftests/preempt-delay


Let me see if I understand the problem.  Your simulated application has
a ridiculous number of threads (1000) all contending for a single lock
with fairly long lock hold times between 600 and 20000 clocks assuming
no cache line misses.  So 1000 threads contending for about 10usec or
1/100 of a tick when HZ=1000.  Giving  you something like 1 chance in
100 of being preempted while holding the lock.  With 1000 threads
those sound like pretty bad odds.

Either your test program is a serious exageration of what your userspace
is doing or this looks like an application design problem.

I am sorry no number of kernel patches can fix a stupid userspace
application, and what is worse it looks like this approach will make
the situation worse for applications that aren't stupid.  Because they
will now suffer from much less predictability in how long they have to
wait for the cpu.

Maybe if this was limited to a cooperating set of userspace
tasks/threads this might not be too bad.  As this exists I have users
who would hunt me down with malicious intent if this code ever showed up
on our servers, because it would make life for every other application
on the server worse.

The only two sane versions of this I can see are (a) having the
scheduler write the predicted next preemption time into the vdso page so
your thread can yield preemptively before taking the lock if it doesn't
look like it has enough time, or (b) limiting this to just a small
cooperating set of threads in a single cgroup.

As you have not limited the effects of this patch and as this will make
latencies worse for every other program on a system I think this is a
horrible approach.  This really is not something you can do unless all
of the threads that could be affected are in the same code base, which
is definitely not the case here.

So for the general horrible idea.
Nacked-With-Extreme-Prejudice-by: "Eric W. Biederman" <ebiederm@...ssion.com>

Cooperative multitasking sucked in Windows 3.1 and it would be much
worse now.  Please stop the crazy.  Linux is challenging enough to
comprehend as it is, and I can't possibly see this patch makes anything
more predictable.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/