[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A5E5430.8000102@nortel.com>
Date: Wed, 15 Jul 2009 16:12:00 -0600
From: "Chris Friesen" <cfriesen@...tel.com>
To: Ted Baker <baker@...fsu.edu>
CC: Raistlin <raistlin@...ux.it>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Douglas Niehaus <niehaus@...c.ku.edu>,
Henrik Austad <henrik@...tad.us>,
LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Bill Huey <billh@...ppy.monkey.org>,
Linux RT <linux-rt-users@...r.kernel.org>,
Fabio Checconi <fabio@...dalf.sssup.it>,
"James H. Anderson" <anderson@...unc.edu>,
Thomas Gleixner <tglx@...utronix.de>,
Dhaval Giani <dhaval.giani@...il.com>,
Noah Watkins <jayhawk@....ucsc.edu>,
KUSP Google Group <kusp@...glegroups.com>,
Tommaso Cucinotta <cucinotta@...up.it>,
Giuseppe Lipari <lipari@...is.sssup.it>
Subject: Re: RFC for a new Scheduling policy/class in the Linux-kernel
Ted Baker wrote:
> I have two questions:
>
> (1) As Jim Anderson points out in a later comment,
> is priority inheritance (of any form) what you really want
> on an SMP system? It does not give a good worst-case blocking
> time bound.
As Peter mentioned, it's not so much a matter of whether it's desired or
not. It's required, at least in terms of supporting priority
inheritance for pthread mutexes. I haven't been involved with the
linux-rt side of things, but I believe Peter implied that they used PI
fairly heavily there to try to minimize priority inversion issues
because it was infeasible to analyze all the various locks. The kernel
is over 10 million lines of code, so any locking changes pretty much
have to fit into the existing framework with minimal changes.
> (2) If you do want priority inheritance, how do I implement the
> mechanics of the mechanics of the unlock operation of B on one
> processor causing C to be preempted on the other processor, simply
> and promptly?
<snip>
> I and a student of mine implemented something like this on a
> VME-bus based SMP system around 1990. We decided to only wake up
> the highest (global) priority task. (In the case above, either A2
> or A3, depending on priority.) We did this using compare-and-swap
> operatoins, in a way that I recall ended up using (for each lock)
> something like one global spin-lock variable, one "contending"
> variable per CPU, and one additional local spinlock variable per
> CPU to avoid bus contention on the global spin-lock variable. We
> used a VME-bus interrupt for the lock-releasing CPU to invoke the
> scheduler of the CPU of the task selected to next receive the
> lock. The interrupt handler could then do the job of waking up
> the corresponding contending task on that CPU.
>
> It worked, but such global locks had a lot more overhead than
> other locks, mostly due to the inter-processor interrupt.
> So, we ended up distinguishing global locks from per-CPU
> locks to lower the overhead when we did not need it.
>
> We were using a partitioned scheduling model, or else this
> would be a bit more complicated, and I would be talking about the
> CPU selected to run the task selected to next receive the lock.
>
> Is there a more efficient way to do this? or are you all
> ready to pay this kind of overhead on every lock in your SMP
> system?
Peter would have a better idea of the low-level implementation than I,
but I suspect that the general idea would be similar, unless we were to
consider migrating the task chosen to run to the current cpu. That
would save the IPI, at a cost of task migration. (Which of course
wouldn't be possible in the case of cpu affinity.)
As to the additional cost, POSIX distinguishes between PI mutexes and
regular mutexes. Any additional penalty due to PI should be limited to
the mutexes which actually have PI enabled. If an application can limit
itself to "safe" syscalls and has enough knowledge of its own locking,
then it could presumably use regular mutexes and avoid the overhead of PI.
I'm not sure the same could apply to the kernel in general...Peter would
have to address that as I'm not familiar with the linux-rt changes.
Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists