[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi==RfJijfLcHn08KBbh8V-3_uPWK9QQoU9sKyz1@mail.gmail.com>
Date: Thu, 25 Nov 2010 22:32:18 +0100
From: Stephane Eranian <eranian@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, mingo@...e.hu, paulus@...ba.org,
davem@...emloft.net, fweisbec@...il.com,
perfmon2-devel@...ts.sf.net, eranian@...il.com,
robert.richter@....com, acme@...hat.com, lizf@...fujitsu.com
Subject: Re: [PATCH 1/2] perf_events: add support for per-cpu per-cgroup
monitoring (v5)
On Thu, Nov 25, 2010 at 4:02 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, 2010-11-25 at 15:51 +0100, Stephane Eranian wrote:
>>
>>
>> On Thu, Nov 25, 2010 at 12:20 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>> On Thu, 2010-11-18 at 12:40 +0200, Stephane Eranian wrote:
>> > @@ -919,6 +945,10 @@ static inline void perf_event_task_sched_in(struct task_struct *task)
>> > static inline
>> > void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
>> > {
>> > +#ifdef CONFIG_CGROUPS
>> > + atomic_t *cgroup_events = &__get_cpu_var(perf_cgroup_events);
>> > + COND_STMT(cgroup_events, perf_cgroup_switch(task, next));
>> > +#endif
>> > COND_STMT(&perf_task_events, __perf_event_task_sched_out(task, next));
>> > }
>>
>>
>> I don't think that'll actually work, the jump label stuff needs a static
>> address.
>>
>> I did not know that.
>
> Yeah, its unfortunate the fallback code doesn't mandate this :/
>
>
>> Why not simply: s/perf_task_events/perf_sched_events/ and
>> increment it
>> for cgroup events as well?
>>
>> But you would need to demultiplex. that's not because perf_sched_events is
>> set that you want BOTH perf_cgroup_switch() AND perf_event_task_sched_out().
>
> The main purpose of the jump-label stuff is to optimize the function
> call and conditional into the perf code away, the moment we a function
> call we might as well do everything, at that point its only a single
> conditional.
>
> Jump labels are supposed to work like (they don't actually work like
> this yet):
>
> my_func:
> asm-foo
> addr_of_nop:
> nop5
> after_nop:
> more-asm-foo
> iret
>
> out_of_line:
> do-special-foo
> jmp after_nop
>
>
> We then keep a section of tuples:
>
> __jump_labels:
>
> &perf_task_events,addr_of_nop
>
> Then when we flip perf_task_events from 0 -> !0 we rewrite the nop5 at
> addr_of_nop to "jmp out_of_line" (5 bytes on x86, hence nop5), or the
> reverse on !0 -> 0.
>
>
> So 1) we need the 'key' (&perf_task_events) to be a static address
> because the compiler needs to place the address in the special section
> -- otherwise we can never find the nop location again, this also means
> per-cpu variables don't make sense, there's only 1 copy of the text.
>
> and 2) the moment we take the out-of-line branch we incur the icache hit
> and already set up a call, so optimizing away another conditional at the
> cost of an extra function call doesn't really make sense.
>
>
Ok, I understand. Thanks for the explanation.
So perf_sched_events would indicate that there may be some perf
ctxsw work to do. It would set as soon as there is at least one
event defined and not just a per-thread event.
BTW, I suspect, isn't there a bug with
PERF_COUNT_SW_CONTEXT_SWITCHES not being updated
if you don't have at least one per-thread event. You need to
hoist that perf_sw_event() into the header file before the COND().
Then, __perf_event_task_sched_out() would be modified
to do some more checks.
__perf_event_task_sched_out(task, next)
{
int ctxn;
perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
for_each_task_context_nr(ctxn)
perf_event_context_sched_out(task, ctxn, next);
}
void __perf_event_task_sched_out(struct task_struct *task,
struct task_struct *next)
{
int ctxn;
int cgroup_events;
perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
croup_events = &__get_cpu_var(perf_cgroup_events);
if (cgroup_events)
perf_cgroup_switch(task, next);
if (perf_task_events)
for_each_task_context_nr(ctxn)
perf_event_context_sched_out(task, ctxn, next);
}
That implies you need to maintain:
- perf_sched_events: number of active events system-wide
- perf_task_events: number of per-thread events system-wide
- cgroup_events: per-cpu variable representing the number of cgroup
events on a CPU
Is that what you are thinking?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists