[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110728075435.GA13833@opentech.at>
Date: Thu, 28 Jul 2011 09:54:37 +0200
From: Nicholas Mc Guire <der.herr@...r.at>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
linux-rt-users <linux-rt-users@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>, Carsten Emde <ce@...g.ch>,
Clark Williams <williams@...hat.com>,
Kumar Gala <galak@...e.crashing.org>,
Ralf Baechle <ralf@...ux-mips.org>,
rostedt <rostedt@...dmis.org>
Subject: Re: On migrate_disable() and latencies
On Wed, 27 Jul 2011, Paul E. McKenney wrote:
> On Wed, Jul 27, 2011 at 01:13:18PM +0200, Peter Zijlstra wrote:
> > On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:
> >
> > > > I suppose it is indeed. Even for the SoftRT case we need to make sure
> > > > the total utilization loss is indeed acceptable.
> > >
> > > OK. If you are doing strict priority, then everything below the highest
> > > priority is workload dependent.
> >
> > <snip throttling, that's a whole different thing>
> >
> > > The higher-priority
> > > tasks can absolutely starve the lower-priority ones, with or without
> > > the migrate-disable capability.
> >
> > Sure, that's how FIFO works, but it also relies on the fact that once
> > your high priority task completes the lower priority task resumes.
> >
> > The extension to SMP is that we run the m highest priority tasks on n
> > cpus ; where m <= n. Any loss in utilization (idle time in this
> > particular case, but irq/preemption/migration and cache overhead are
> > also time not spend on the actual workload.
> >
> > Now the WCET folks are all about quantifying the needs of applications
> > and the utilization limits of the OS etc. And while for SoftRT you can
> > relax quite a few of the various bounds you still need to know them in
> > order relax them (der Hofrat likes to move from worst case to avg case
> > IIRC).
>
its not about worst case vs. average case its about using the distribution
rather than boundary values - boundary values are hard to correlate with
specific events.
> ;-)
>
> > > Another way of looking at it is from the viewpoint of the additional
> > > priority-boost events. If preemption is disabled, the low-priority task
> > > will execute through the preempt-disable region without context switching.
> > > In contrast, given a migration-disable region, the low-priority task
> > > might be preempted and then boosted. (If I understand correctly, if some
> > > higher-priority task tries to enter the same type of migration-disable
> > > region, it will acquire the associated lock, thus priority-boosting the
> > > task that is already in that region.)
> >
> > No, there is no boosting involved, migrate_disable() isn't intrinsically
> > tied to a lock or other PI construct. We might needs locks to keep some
> > of the per-cpu crap correct, but that again, is a whole different ball
> > game.
> >
> > But even if it was, I don't think PI will help any for this, we still
> > need to complete the various migrate_disable() sections, see below.
>
> OK, got it. I think, anyway. I was incorrectly (or at least unhelpfully)
> pulling in locks that might be needed to handle per-CPU variables.
>
> > > One stupid-but-tractable way to model this is to have an interarrival
> > > rate for the various process priorities, and then calculate the odds of
> > > (1) a higher priority process arriving while the low-priority one is
> > > in a *-disable region and (2) that higher priority process needing to
> > > enter a conflicting *-disable region. This would give you some measure
> > > of the added boosting load due to migration-disable as compared to
> > > preemption-disable.
> > >
> > > Would this sort of result be useful?
> >
> > Yes, such type of analysis can be used, and I guess we can measure
> > various variables related to that.
>
> OK, good.
>
> > > > My main worry with all this is that we have these insane long !preempt
> > > > regions in mainline that are now !migrate regions, and thus per all the
> > > > above we could be looking at a substantial utilization loss.
> > > >
> > > > Alternatively we could all be missing something far more horrid, but
> > > > that might just be my paranoia talking.
> > >
> > > Ah, good point -- if each migration-disable region is associated with
> > > a lock, then you -could- allow migration and gain better utilization
> > > at the expense of worse caching behavior. Is that the concern?
> >
> > I'm not seeing how that would be true, suppose you have this stack of 4
> > migrate_disable() sections and 3 idle cpus, no amount of boosting will
> > make the already running task at the top of the stack go any faster, and
> > it needs to complete the migrate_disable section before it can be
> > migrated, equally so for the rest, so you still need
> > 3*migrate-disable-period of time before all your cpus are busy again.
> >
> > You can move another task to the top of the stack by boosting, but
> > you'll need 3 tasks to complete their resp migrate-disable section, it
> > doesn't matter which task, so boosting doesn't change anything.
>
> OK, so let me see if I understand what you are looking to model.
>
> o There are no locks.
>
> o There are a finite number of tasks with varying priorities.
> (I would initially work with a single task per priority
> level, but IIRC it is not hard to make multiple tasks per
> priority work. Not a fan of infinite numbers of priorities,
> though!)
>
> o There are multiple CPUs.
>
> o Once a task enters a migrate-disable region, it must remain
> on that CPU. (I will initially model the migrate-disable region
> as consuming a fixed amount of CPU. If I wanted to really wuss
> out, I would model it as consuming an exponentially distributed
> amount of CPU.)
>
> o Tasks awakening outside of migrate-disable regions will pick
> the CPU running the lowest-priority task, whether or not this
> task is in migrate-disable state. (At least I don't see
> anything in 3.0-rt3 that looks like a scheduling decision
> based on ->migrate_disable, perhaps due to blindness.)
>
This might be a simple heuristics to minimize the probability of stacking
in the first place.
> o For an example, if all CPUs except for one are running prio-99
> tasks, and the remaining CPU is running a prio-1 task in
> a migrate-disable region, if a prio-2 tasks awakens, it
> will preempt the prio-1 task.
all CPUs utilized so no utilization loss at all in that szenario
>
> On the other hand, if at least one of the CPUs was idle,
> the prio-2 task would have instead run on that idle CPU.
so what you need to add to the model is the probability of the transitional
event:
* prio-2 task preempts prio-1 task because all CPUs are idle
* atleast one CPU becomes idle while prio-1 task is blocked for migration
due to migrate-disable + preemted by prio-2 task
only in this combination does the system suffer a utilization penalty.
>
> o The transition probabilities depend on the priority
> of the currently running migrate-disable CPU -- the higher
> that priority, the greater the chance that any preempting
> task will find some other CPU instead.
>
> The recurrence times depend on the number of tasks stacked
> up in migrate-disable regions on that CPU.
>
> If this all holds, it would be possible to compute the probability
> of a given migrate-disable region being preempted and if preempted,
> the expected duration of that preemption, given the following
> quantities as input:
>
> o The probability that a given CPU is running a task
> of priority P for each priority. The usual way to
> estimate this is based on per-thread CPU utilizations.
>
> o The expected duration of migrate-disable regions.
>
> o The expected wakeups per second for tasks of each priority.
>
> With the usual disclaimers about cheezy mathematical approximations
> of reality and all that.
>
> Would this be useful, or am I still missing the point?
>
to get an estimation of the latency impact - but to get a estimate of the
impact on system utilization you would need to include the probability that a
different CPU is idle in the system and would in principle allow running
one of the tasks that can'b be migrated. As I understood it, the initial
questions was if migrate_disable has a relevant impact on system utilization
in multicore systems. For this question I guess two of the key parameters are
* probability that migrate-disable stacking occures
* probability of a idle CPU transition while stacking persists
I guess the probability of an idle transition of a CPU is hard to model as it
is very profile specific.
hofrat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists