linux-kernel - Re: [RFC] [PATCH] Pre-emption control for userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1403051555360.6682@nftneq.ynat.uz>
Date:	Wed, 5 Mar 2014 15:59:42 -0800 (PST)
From:	David Lang <david@...g.hm>
To:	Khalid Aziz <khalid.aziz@...cle.com>
cc:	Oleg Nesterov <oleg@...hat.com>, Andi Kleen <andi@...stfloor.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
	peterz@...radead.org, akpm@...ux-foundation.org,
	viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org
Subject: Re: [RFC] [PATCH] Pre-emption control for userspace

On Wed, 5 Mar 2014, Khalid Aziz wrote:

> On 03/05/2014 04:13 PM, David Lang wrote:
>> Yes, you have paid the cost of the context switch, but your original
>> problem description talked about having multiple other threads trying to
>> get the lock, then spinning trying to get the lock (wasting time if the
>> process holding it is asleep, but not if it's running on another core)
>> and causing a long delay before the process holding the lock gets a
>> chance to run again.
>> 
>> Having the threads immediately yield to the process that has the lock
>> reduces this down to two context switches, which isn't perfect, but it's
>> a LOT better than what you started from.
>
> OK, let us consider the multiple core scenario:
>
> - Thread A gets scheduled on core 1, runs for 95% of its timeslice before it 
> gets to its critical section.
> - Thread A grabs the lock and quickly reaches the end of its timeslice before 
> finishing its critical section.
> - Thread A is preempted on core 1 by a completely unrelated thread.
> - Thread B in the mean time is scheduled on core 2 and happens to get to its 
> critical section right away where it tries to grab the lock held by thread A. 
> It spins for a bit waiting to see if lock becomes available, gives up and 
> yields to next process in queue.
> - Since thread A ran recently, it is now stuck towards the end of run queue, 
> so thread C gets to run on core 2 which goes through same fate as thread A.
>
> Now scale this scenario across more cores and more threads that all want the 
> same lock to execute their small critical section. This cost of spinning and 
> context switch could have been avoided if thread A could get additional 
> timeslice to complete its critical section. Yielding to the process holding 
> the lock happens only after contention has happened and we have paid the 
> price of two context switches for the lock owner. When yield_to() happens, 
> the lock owner may not get to run on the core it was on because an unrelated 
> thread is running on that core and it needs to wait for that thread's 
> timeslice to run out. If the lock owner gets scheduled on another core, we 
> pay the price of repopulating the cache for a new thread on that core. 
> yield_to() is better than having a convoy building up of processes waiting 
> for the same lock but worse than avoiding the contention altogether.

with yield_to(), when thread B hit the contention, thread A would get more time 
an be able to complete immediatly, so by the time thread C gets around to 
running, there is no longer any contention.

>> 
>> well, writing to something in /proc isn't free either. And how is the
>> thread supposed to know if it needs to do so or if it's going to have
>> enough time to finish it's work before it's out of time (how can it know
>> how much time it would have left anyway?)
>
> Cost is writing to a memory location since thread is using mmap, not 
> insignificant but hardly expensive. Thread does not need to know how much 
> time it has left in current timeslice. It always sets the flag to request 
> pre-emption immunity before entering the critical section and clears the flag 
> when it exits its critical section. If the thread comes up for pre-emption 
> while the flag is set, it gets immunity. If it does not, flag will be cleared 
> at the end of critical section any way.

what's the cost to setup mmap of this file in /proc. this is sounding like a lot 
of work.

>> is this gain from not giving up the CPU at all? or is it from avoiding
>> all the delays due to the contending thread trying in turn? the
>> yield_to() approach avoids all those other threads trying in turn so it
>> should get fairly close to the same benefits.
>> 
>
> The gain is from avoiding contention by giving locking thread a chance to 
> complete its critical section which is expected to be very short (certainly 
> shorter than timeslice). Pre-emption immunity gives it one and only one 
> additional timeslice.

but the yield_to() does almost the same thing, there is a small bump, but you 
don't have to wait for thread B to spin, thread C..ZZZ etc to spin before thread 
A can finish it's work. As soon as the second thread hits the critical section, 
thread A is going to be able to do more work (and hopefully finish)

> Hope this helps clear things up.

It doesn't sound like you and I are understanding how the yield_to() approach 
would work. I hope my comments have helped get us on the same page.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/