linux-kernel - Re: On migrate_disable() and latencies

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110727183008.GA2407@linux.vnet.ibm.com>
Date:	Wed, 27 Jul 2011 11:30:08 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>, Carsten Emde <ce@...g.ch>,
	Clark Williams <williams@...hat.com>,
	Kumar Gala <galak@...e.crashing.org>,
	Ralf Baechle <ralf@...ux-mips.org>,
	rostedt <rostedt@...dmis.org>,
	Nicholas Mc Guire <der.herr@...r.at>
Subject: Re: On migrate_disable() and latencies

On Wed, Jul 27, 2011 at 01:13:18PM +0200, Peter Zijlstra wrote:
> On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:
> 
> > > I suppose it is indeed. Even for the SoftRT case we need to make sure
> > > the total utilization loss is indeed acceptable.
> > 
> > OK.  If you are doing strict priority, then everything below the highest
> > priority is workload dependent. 
> 
> <snip throttling, that's a whole different thing>
> 
> >  The higher-priority
> > tasks can absolutely starve the lower-priority ones, with or without
> > the migrate-disable capability.
> 
> Sure, that's how FIFO works, but it also relies on the fact that once
> your high priority task completes the lower priority task resumes.
> 
> The extension to SMP is that we run the m highest priority tasks on n
> cpus ; where m <= n. Any loss in utilization (idle time in this
> particular case, but irq/preemption/migration and cache overhead are
> also time not spend on the actual workload.
> 
> Now the WCET folks are all about quantifying the needs of applications
> and the utilization limits of the OS etc. And while for SoftRT you can
> relax quite a few of the various bounds you still need to know them in
> order relax them (der Hofrat likes to move from worst case to avg case
> IIRC).

;-)

> > Another way of looking at it is from the viewpoint of the additional
> > priority-boost events.  If preemption is disabled, the low-priority task
> > will execute through the preempt-disable region without context switching.
> > In contrast, given a migration-disable region, the low-priority task
> > might be preempted and then boosted.  (If I understand correctly, if some
> > higher-priority task tries to enter the same type of migration-disable
> > region, it will acquire the associated lock, thus priority-boosting the
> > task that is already in that region.)
> 
> No, there is no boosting involved, migrate_disable() isn't intrinsically
> tied to a lock or other PI construct. We might needs locks to keep some
> of the per-cpu crap correct, but that again, is a whole different ball
> game.
> 
> But even if it was, I don't think PI will help any for this, we still
> need to complete the various migrate_disable() sections, see below.

OK, got it.  I think, anyway.  I was incorrectly (or at least unhelpfully)
pulling in locks that might be needed to handle per-CPU variables.

> > One stupid-but-tractable way to model this is to have an interarrival
> > rate for the various process priorities, and then calculate the odds of
> > (1) a higher priority process arriving while the low-priority one is
> > in a *-disable region and (2) that higher priority process needing to
> > enter a conflicting *-disable region.  This would give you some measure
> > of the added boosting load due to migration-disable as compared to
> > preemption-disable.
> > 
> > Would this sort of result be useful?
> 
> Yes, such type of analysis can be used, and I guess we can measure
> various variables related to that.

OK, good.

> > > My main worry with all this is that we have these insane long !preempt
> > > regions in mainline that are now !migrate regions, and thus per all the
> > > above we could be looking at a substantial utilization loss.
> > > 
> > > Alternatively we could all be missing something far more horrid, but
> > > that might just be my paranoia talking.
> > 
> > Ah, good point -- if each migration-disable region is associated with
> > a lock, then you -could- allow migration and gain better utilization
> > at the expense of worse caching behavior.  Is that the concern?
> 
> I'm not seeing how that would be true, suppose you have this stack of 4
> migrate_disable() sections and 3 idle cpus, no amount of boosting will
> make the already running task at the top of the stack go any faster, and
> it needs to complete the migrate_disable section before it can be
> migrated, equally so for the rest, so you still need
> 3*migrate-disable-period of time before all your cpus are busy again.
> 
> You can move another task to the top of the stack by boosting, but
> you'll need 3 tasks to complete their resp migrate-disable section, it
> doesn't matter which task, so boosting doesn't change anything.

OK, so let me see if I understand what you are looking to model.

o	There are no locks.

o	There are a finite number of tasks with varying priorities.
	(I would initially work with a single task per priority
	level, but IIRC it is not hard to make multiple tasks per
	priority work.  Not a fan of infinite numbers of priorities,
	though!)

o	There are multiple CPUs.

o	Once a task enters a migrate-disable region, it must remain
	on that CPU.  (I will initially model the migrate-disable region
	as consuming a fixed amount of CPU.  If I wanted to really wuss
	out, I would model it as consuming an exponentially distributed
	amount of CPU.)

o	Tasks awakening outside of migrate-disable regions will pick
	the CPU running the lowest-priority task, whether or not this
	task is in migrate-disable state.  (At least I don't see
	anything in 3.0-rt3 that looks like a scheduling decision
	based on ->migrate_disable, perhaps due to blindness.)

o	For an example, if all CPUs except for one are running prio-99
	tasks, and the remaining CPU is running a prio-1 task in
	a migrate-disable region, if a prio-2 tasks awakens, it
	will preempt the prio-1 task.

	On the other hand, if at least one of the CPUs was idle,
	the prio-2 task would have instead run on that idle CPU.

o	The transition probabilities depend on the priority
	of the currently running migrate-disable CPU -- the higher
	that priority, the greater the chance that any preempting
	task will find some other CPU instead.

	The recurrence times depend on the number of tasks stacked
	up in migrate-disable regions on that CPU.

If this all holds, it would be possible to compute the probability
of a given migrate-disable region being preempted and if preempted,
the expected duration of that preemption, given the following
quantities as input:

o	The probability that a given CPU is running a task
	of priority P for each priority.  The usual way to
	estimate this is based on per-thread CPU utilizations.

o	The expected duration of migrate-disable regions.

o	The expected wakeups per second for tasks of each priority.

With the usual disclaimers about cheezy mathematical approximations
of reality and all that.

Would this be useful, or am I still missing the point?

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/