linux-kernel - Re: [RFC][PATCH] perf: Rewrite core context handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <70079805-1CAE-4CAA-813A-F8DDB929F22B@fb.com>
Date:   Thu, 11 Oct 2018 22:37:14 +0000
From:   Song Liu <songliubraving@...com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Ingo Molnar <mingo@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        "acme@...nel.org" <acme@...nel.org>,
        "alexander.shishkin@...ux.intel.com" 
        <alexander.shishkin@...ux.intel.com>,
        "jolsa@...hat.com" <jolsa@...hat.com>,
        "eranian@...gle.com" <eranian@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "alexey.budankov@...ux.intel.com" <alexey.budankov@...ux.intel.com>,
        "mark.rutland@....com" <mark.rutland@....com>,
        "megha.dey@...el.com" <megha.dey@...el.com>,
        "frederic@...nel.org" <frederic@...nel.org>
Subject: Re: [RFC][PATCH] perf: Rewrite core context handling

Thanks Peter! These are really really helpful. 

I am trying to think through the case of a group of two events on two 
separate hardware PMUs. In current implementation, this will not trigger
move_group, so they will not be able to rotate together? And actually, 
they may not be able to run at all? Maybe this case is never supported? 

On the other hand, would something like this work:

    perf_cpu_context <-[1:2]-> perf_event_context <-[1:n]-> perf_event 
              |                                                |
              `----[1:n]---->     pmu    <----- [1:n]----------' 

1. Every cpu has only one perf_cpu_context. No perf_cpu_pmu_context. 
2. perf_cpu_context has two perf_event_context, one for the cpu, the 
   other for the task. 
3. Each perf_event_context has 3 perf_event_groups, pinned_groups, 
   flexible_groups, and software_groups (for sw event only groups). 
4. All flexible_groups of the same cpu rotate a the same time. If 
   there are two hardware PMUs on the cpu, the rotation will look 
   like: 1) stop both PMUs; 2) rotate events; 3) start both PMUs. 

I feel this will make the implementation simpler. Is it too broken in 
some cases? Or did I miss anything obvious? One thing I noticed is 
that we need to drop per PMU config perf_event_mux_interval_ms. 

Please let me know whether this makes sense at all. I will read 
more of current version in the meanwhile. 

Thanks again,
Song


> On Oct 11, 2018, at 2:29 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Thu, Oct 11, 2018 at 07:50:23AM +0000, Song Liu wrote:
>> Hi Peter, 
>> 
>> I am trying to understand this. Pardon me if any question is silly. 
>> 
>> I am not sure I fully understand the motivation here. I guess we
>> see problem when there are two (or more) independent hardware PMUs 
>> per cpu? Then on a given cpu, there are two (or more) 
>> perf_cpu_context, but only one task context? 
> 
> Right.
> 
>> If this is correct (I really doubt...), I guess perf_rotate_context()
>> is the problem? 
> 
> No, everything comes apart. Where would you put the events of the second
> PMU?
> 
> The thing most often proposed it pretending the second PMU is a
> 'software' PMU and sticking the events on the software PMU context.
> 
> But because software PMUs must never fail to schedule an event, that
> results in some quite horrible things -- including that we cannot RR the
> events.
> 
> Similarly the big.little guys have the problem that the PMUs are not the
> same between big and little cores, and they fudge something horrible. By
> having clear ordering on PMU, that can be cleaned up too.
> 
>> And if this is still correct, this patch may not help,
>> as we are doing rotation for each perf_cpu_pmu_context? (or rotation 
>> per perf_event_context is the next step?). 
> 
> We do indeed to rotation per perf_cpu_pmu_context, however:
> 
> - perf_cpu_pmu_context embeds a cpu scope perf_event_pmu_context,
> - perf_cpu_pmu_context tracks the currently associated task scope
>   perf_event_pmu_context.
> 
> So it can rotate all current events for a particular PMU.
> 
>> Or step back a little... I see two big changes:
>> 
>> 1. struct perf_ctx_context is now per cpu (instead of per pmu per cpu);
>> 2. one perf_event_ctxp per task_struct (instead of 2).  
> 
> Correct, we reduce to 1 cpu context and 1 task context at all times.
> This in fact simplifies quite a bit of things.
> 
>> I think #1 is a bigger change than #2. Is this correct? 
> 
> They're the 'same' change. But yes the primary purpose was 2, but having
> only a single cpu context is a direct consequence.
> 
>> Could you please help me understand it better? 
> 
> I hope this helps to understand, please feel free to ask more.