[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091116202214.GD360@elte.hu>
Date: Mon, 16 Nov 2009 21:22:14 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Stijn Devriendt <highguy@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mike Galbraith <efault@....de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrea Arcangeli <andrea@...e.de>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
peterz@...radead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] observe and act upon workload parallelism:
PERF_TYPE_PARALLELISM (Was: [RFC][PATCH] sched_wait_block: wait for blocked
threads)
* Stijn Devriendt <highguy@...il.com> wrote:
> > And then we can use poll() in the thread manager task to observe
> > PIDs, workloads or full CPUs. The poll() implementation of perf
> > events is fast and scalable.
>
> I've had a quick peek at the perf code and how it currently hooks into
> the scheduler and at first glance it looks like 2 additional context
> switches are required when using perf. The scheduler will first
> schedule the idle thread to later find out that the schedule tail woke
> up another process to run. My initial solution woke up the process
> before making a scheduling decision. Depending on context switch times
> the original blocking operation may have been unblocked (especially on
> SMP); e.g. a blocked user-space mutex which was held shortly. Feel
> free to correct me here as it was merely a quick peek.
( Btw., the PERF_TYPE_PARALLELISM name sucks. A better name would be
PERF_COUNT_SW_TASKS or PERF_COUNT_SW_THREAD_POOL or so. )
I'd definitely not advocate a 'controller thread' approach: it's an
unnecessary extra intermediary and it doubles the context switch cost
and tears cache footprint apart.
We want any such scheme to schedule 'naturally' and optimally: i.e. a
blocking thread will schedule an available thread - no ifs and when.
The only limit we want is on concurrency - and we can do that by waking
tasks from the poll() waitqueue if a task blocks - and by requeueing
woken tasks to the poll() waitqueue if a task wakes (and if the
concurrency threshold does not allow it to run)..
In a sense the poll() waitqueue becomes a mini-runqueue for 'ready'
tasks - and the 'number of tasks running' value of the sw event object a
rq->nr_running value. It does not make the tasks available to the real
scheduler - but it's a list of tasks that are willing to run.
This would be a perfect and suitable use of poll() concepts i think -
and well-optimized one as well. It could even be plugged into epoll().
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists