[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1238672893.8530.5909.camel@twins>
Date: Thu, 02 Apr 2009 13:48:13 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Ingo Molnar <mingo@...e.hu>
Cc: Paul Mackerras <paulus@...ba.org>,
Corey Ashford <cjashfor@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/6] RFC perf_counter: singleshot support
On Thu, 2009-04-02 at 12:51 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
>
> > By request, provide a way for counters to disable themselves and
> > signal at the first counter overflow.
> >
> > This isn't complete, we really want pending work to be done ASAP
> > after queueing it. My preferred method would be a self-IPI, that
> > would ensure we run the code in a usable context right after the
> > current (IRQ-off, NMI) context is done.
>
> Hm. I do think self-IPIs can be fragile but the more work we do in
> NMI context the more compelling of a case can be made for a
> self-IPI. So no big arguments against that.
Its not only NMI, but also things like software events in the scheduler
under rq->lock, or hrtimers in irq context. You cannot do a wakeup from
under rq->lock, nor hrtimer_cancel() from within the timer handler.
All these nasty little issues stack up and could be solved with a
self-IPI.
Then there is the software task-time clock which uses
p->se.sum_exec_runtime which requires the rq->lock to be read. Coupling
this with for example an NMI overflow handler gives an instant deadlock.
Would you terribly mind if I remove all that sum_exec_runtime and
rq->lock stuff and simply use cpu_clock() to keep count. These things
get context switched along with tasks anyway.
> So i think we need 3 separate things:
>
> - the ability to set a signal attribute of the counter (during
> creation) via a (signo,tid) pair.
>
> Semantics:
>
> - it can be a regular signal (signo < 32),
> or an RT/queued signal (signo >= 32).
>
> - It may be sent to the task that generated the event (tid == 0),
> or it may be sent to a specific task (tid > 0),
> or it may be sent to a task group (tid < 0).
kill_pid() seems to be able to do all of that:
struct pid *pid;
int tid, priv;
perf_counter_disable(counter);
rcu_read_lock();
tid = counter->hw_event.signal_tid;
if (!tid)
tid = current->pid;
priv = 1;
if (tid < 0) {
priv = 0;
tid = -tid;
}
pid = find_vpid(tid);
if (pid)
kill_pid(pid, counter->hw_event.signal_nr, priv);
rcu_read_unlock();
Should do I afaict.
Except I probably should look into this pid-namespace mess and clean all
that up.
> - 'event limit' attribute: the ability to pause new events after N
> events. This limit auto-decrements on each event.
> limit==1 is the special case for single-shot.
That should go along with a toggle on what an event is I suppose, either
an 'output' event or a filled page?
Or do we want to limit that to counter overflow?
> - new ioctl method to refill the limit, when user-space is ready to
> receive new events. A special-case of this is when a signal
> handler calls ioctl(refill_limit, 1) in the single-shot case -
> this re-enables events after the signal has been handled.
Right, with the method implemented above, its simply a matter of the
enable ioctl.
> Another observation: i think perf_counter_output() needs to depend
> on whether the counter is signalling, not on the single-shot-ness of
> the counter.
>
> A completely valid use of this would be for user-space to create an
> mmap() buffer of 1024 events, then set the limit to 1024, and wait
> for the 1024 events to happen - process them and close the counter.
> Without any signalling.
Say we have a limit > 1, and a signal, that would mean we do not
generate event output?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists