linux-kernel - Re: [PATCH 1/2] perf_events: add support for per-cpu per-cgroup monitoring (v5)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTi==RfJijfLcHn08KBbh8V-3_uPWK9QQoU9sKyz1@mail.gmail.com>
Date:	Thu, 25 Nov 2010 22:32:18 +0100
From:	Stephane Eranian <eranian@...gle.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, paulus@...ba.org,
	davem@...emloft.net, fweisbec@...il.com,
	perfmon2-devel@...ts.sf.net, eranian@...il.com,
	robert.richter@....com, acme@...hat.com, lizf@...fujitsu.com
Subject: Re: [PATCH 1/2] perf_events: add support for per-cpu per-cgroup
 monitoring (v5)

On Thu, Nov 25, 2010 at 4:02 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, 2010-11-25 at 15:51 +0100, Stephane Eranian wrote:
>>
>>
>> On Thu, Nov 25, 2010 at 12:20 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>>         On Thu, 2010-11-18 at 12:40 +0200, Stephane Eranian wrote:
>>         > @@ -919,6 +945,10 @@ static inline void perf_event_task_sched_in(struct task_struct *task)
>>         >  static inline
>>         >  void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
>>         >  {
>>         > +#ifdef CONFIG_CGROUPS
>>         > +       atomic_t *cgroup_events = &__get_cpu_var(perf_cgroup_events);
>>         > +       COND_STMT(cgroup_events, perf_cgroup_switch(task, next));
>>         > +#endif
>>         >         COND_STMT(&perf_task_events, __perf_event_task_sched_out(task, next));
>>         >  }
>>
>>
>>         I don't think that'll actually work, the jump label stuff needs a static
>>         address.
>>
>> I did not know that.
>
> Yeah, its unfortunate the fallback code doesn't mandate this :/
>
>
>>         Why not simply: s/perf_task_events/perf_sched_events/ and
>>         increment it
>>         for cgroup events as well?
>>
>> But you would need to demultiplex. that's not because perf_sched_events is
>> set that you want BOTH perf_cgroup_switch() AND perf_event_task_sched_out().
>
> The main purpose of the jump-label stuff is to optimize the function
> call and conditional into the perf code away, the moment we a function
> call we might as well do everything, at that point its only a single
> conditional.
>
> Jump labels are supposed to work like (they don't actually work like
> this yet):
>
> my_func:
>    asm-foo
> addr_of_nop:
>    nop5
> after_nop:
>    more-asm-foo
>    iret
>
> out_of_line:
>    do-special-foo
>    jmp after_nop
>
>
> We then keep a section of tuples:
>
> __jump_labels:
>
>  &perf_task_events,addr_of_nop
>
> Then when we flip perf_task_events from 0 -> !0 we rewrite the nop5 at
> addr_of_nop to "jmp out_of_line" (5 bytes on x86, hence nop5), or the
> reverse on !0 -> 0.
>
>
> So 1) we need the 'key' (&perf_task_events) to be a static address
> because the compiler needs to place the address in the special section
> -- otherwise we can never find the nop location again, this also means
> per-cpu variables don't make sense, there's only 1 copy of the text.
>
> and 2) the moment we take the out-of-line branch we incur the icache hit
> and already set up a call, so optimizing away another conditional at the
> cost of an extra function call doesn't really make sense.
>
>
Ok, I understand. Thanks for the explanation.

So perf_sched_events would indicate that there may be some perf
ctxsw work to do. It would set as soon as there is at least one
event defined and not just a per-thread event.

BTW, I suspect, isn't  there a bug with
PERF_COUNT_SW_CONTEXT_SWITCHES not being updated
if you don't have at least one per-thread event. You need to
hoist that perf_sw_event() into the header file before the COND().

Then, __perf_event_task_sched_out() would be modified
to do some more checks.

__perf_event_task_sched_out(task, next)
{
        int ctxn;

        perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);

        for_each_task_context_nr(ctxn)
                perf_event_context_sched_out(task, ctxn, next);
}


void __perf_event_task_sched_out(struct task_struct *task,
                                 struct task_struct *next)
{
        int ctxn;
        int cgroup_events;

        perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);

        croup_events = &__get_cpu_var(perf_cgroup_events);
        if (cgroup_events)
                perf_cgroup_switch(task, next);

        if (perf_task_events)
             for_each_task_context_nr(ctxn)
                    perf_event_context_sched_out(task, ctxn, next);
}

That implies you need to maintain:
   - perf_sched_events: number of active events system-wide
   - perf_task_events: number of per-thread events system-wide
   - cgroup_events: per-cpu variable representing the number of cgroup
events on a CPU

Is that what you are thinking?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/