[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5317B7D5.2030403@oracle.com>
Date: Wed, 05 Mar 2014 16:48:37 -0700
From: Khalid Aziz <khalid.aziz@...cle.com>
To: David Lang <david@...g.hm>
CC: Oleg Nesterov <oleg@...hat.com>, Andi Kleen <andi@...stfloor.org>,
Thomas Gleixner <tglx@...utronix.de>,
One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
peterz@...radead.org, akpm@...ux-foundation.org,
viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org
Subject: Re: [RFC] [PATCH] Pre-emption control for userspace
On 03/05/2014 04:13 PM, David Lang wrote:
> Yes, you have paid the cost of the context switch, but your original
> problem description talked about having multiple other threads trying to
> get the lock, then spinning trying to get the lock (wasting time if the
> process holding it is asleep, but not if it's running on another core)
> and causing a long delay before the process holding the lock gets a
> chance to run again.
>
> Having the threads immediately yield to the process that has the lock
> reduces this down to two context switches, which isn't perfect, but it's
> a LOT better than what you started from.
OK, let us consider the multiple core scenario:
- Thread A gets scheduled on core 1, runs for 95% of its timeslice
before it gets to its critical section.
- Thread A grabs the lock and quickly reaches the end of its timeslice
before finishing its critical section.
- Thread A is preempted on core 1 by a completely unrelated thread.
- Thread B in the mean time is scheduled on core 2 and happens to get to
its critical section right away where it tries to grab the lock held by
thread A. It spins for a bit waiting to see if lock becomes available,
gives up and yields to next process in queue.
- Since thread A ran recently, it is now stuck towards the end of run
queue, so thread C gets to run on core 2 which goes through same fate as
thread A.
Now scale this scenario across more cores and more threads that all want
the same lock to execute their small critical section. This cost of
spinning and context switch could have been avoided if thread A could
get additional timeslice to complete its critical section. Yielding to
the process holding the lock happens only after contention has happened
and we have paid the price of two context switches for the lock owner.
When yield_to() happens, the lock owner may not get to run on the core
it was on because an unrelated thread is running on that core and it
needs to wait for that thread's timeslice to run out. If the lock owner
gets scheduled on another core, we pay the price of repopulating the
cache for a new thread on that core. yield_to() is better than having a
convoy building up of processes waiting for the same lock but worse than
avoiding the contention altogether.
>
> well, writing to something in /proc isn't free either. And how is the
> thread supposed to know if it needs to do so or if it's going to have
> enough time to finish it's work before it's out of time (how can it know
> how much time it would have left anyway?)
Cost is writing to a memory location since thread is using mmap, not
insignificant but hardly expensive. Thread does not need to know how
much time it has left in current timeslice. It always sets the flag to
request pre-emption immunity before entering the critical section and
clears the flag when it exits its critical section. If the thread comes
up for pre-emption while the flag is set, it gets immunity. If it does
not, flag will be cleared at the end of critical section any way.
> is this gain from not giving up the CPU at all? or is it from avoiding
> all the delays due to the contending thread trying in turn? the
> yield_to() approach avoids all those other threads trying in turn so it
> should get fairly close to the same benefits.
>
The gain is from avoiding contention by giving locking thread a chance
to complete its critical section which is expected to be very short
(certainly shorter than timeslice). Pre-emption immunity gives it one
and only one additional timeslice.
Hope this helps clear things up.
Thanks,
Khalid
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists