linux-kernel - Re: [PATCH v6] perf: Sharing PMU counters across compatible events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <168F3761-98CF-4E91-B911-ECB9FCD68F0C@fb.com>
Date:   Wed, 6 Nov 2019 17:40:29 +0000
From:   Song Liu <songliubraving@...com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     open list <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "acme@...nel.org" <acme@...nel.org>,
        "Arnaldo Carvalho de Melo" <acme@...hat.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Alexey Budankov <alexey.budankov@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>, "Tejun Heo" <tj@...nel.org>
Subject: Re: [PATCH v6] perf: Sharing PMU counters across compatible events



> On Nov 6, 2019, at 1:14 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Tue, Nov 05, 2019 at 11:06:06PM +0000, Song Liu wrote:
>> 
>> 
>>> On Nov 5, 2019, at 12:16 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>>> 
>>> On Tue, Nov 05, 2019 at 05:11:08PM +0000, Song Liu wrote:
>>> 
>>>>> I think we can use one of the event as master. We need to be careful when
>>>>> the master event is removed, but it should be doable. Let me try. 
>>>> 
>>>> Actually, there is a bigger issue when we use one event as the master: what
>>>> shall we do if the master event is not running? Say it is an cgroup event, 
>>>> and the cgroup is not running on this cpu. An extra master (and all these
>>>> array hacks) help us get O(1) complexity in such scenario. 
>>>> 
>>>> Do you have suggestions on how to solve this problem? Maybe we can keep the 
>>>> extra master, and try get rid of the double alloc? 
>>> 
>>> Right, you have to consider scope when sharing. The master should be the
>>> largest scope event and any slaves should be complete subsets.
>>> 
>>> Without much thought this seems a fairly straight forward constraint;
>>> that is, given cgroups I'm not immediately seeing how we can violate
>>> that.
>>> 
>>> Basically, pick the cgroup event nearest to the root as the master.
>>> We have to have logic to re-elect the master anyway for deletion, so
>>> changing it on add shouldn't be different.
>>> 
>>> (obviously the root-cgroup is cpu-wide and always on, and if you have
>>> two events from disjoint subtrees they have no overlap, so it doesn't
>>> make sense to share anyway)
>> 
>> Hmm... I didn't think about cgroup structure with this much detail. And 
>> this is very interesting idea. 
>> 
>> OTOH, non-cgroup event could also be inactive. For example, when we have 
>> to rotate events, we may schedule slave before master. 
> 
> Right, although I suppose in that case you can do what you did in your
> patch here. If someone did IOC_DISABLE on the master, we'd have to
> re-elect a master -- obviously (and IOC_ENABLE).

Re-elect master on IOC_DISABLE is good. But we still need work for ctx
rotation. Otherwise, we need keep the master on at all time. 

> 
>> And if the master is in an event group, it will be more complicated...
> 
> Hurmph, do you actually have that use-case? And yes, this one is tricky.
> 
> Would it be sufficient if we disallow group events to be master (but
> allow them to be slaves) ?

Maybe we can solve this with an extra "first_active" pointer in perf_event.
first_active points to the first event that being added by event_pmu_add(). 
Then we need something like:

event_pmu_add(event)
{
	if (event->dup_master->first_active) {
		/* sync with first_active */
	} else {
		/* this event will be the first_active */
		event->dup_master->first_active = event;
		pmu->add(event);
	}
}

However, I just realized the event_pmu_del() path need some more thoughts, 
because first_active is likely the first one get sched_out(). 

Merging another email here:

>> If we do GFP_ATOMIC in perf_event_alloc(), maybe with an extra option, we
>> don't need the tmp_master hack. So we only allocate master when we will 
>> use it. 
> 
> You can't, that's broken on -RT. ctx->lock is a raw_spinlock_t and
> allocator locks are spinlock_t.

How about we add another step in __perf_install_in_context(), like

__perf_install_in_context()
{
	bool alloc_master;

	perf_ctx_lock();
	alloc_master = find_new_sharing(event, ctx);
	perf_ctx_unlock();
	
	if (alloc_master)
		event->dup_master = perf_event_alloc();

	/* existing logic of __perf_install_in_context() */

}

In this way, we only allocate the master event when necessary, and it
is outside of the locks. 

Thanks,
Song