[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <168F3761-98CF-4E91-B911-ECB9FCD68F0C@fb.com>
Date: Wed, 6 Nov 2019 17:40:29 +0000
From: Song Liu <songliubraving@...com>
To: Peter Zijlstra <peterz@...radead.org>
CC: open list <linux-kernel@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
"acme@...nel.org" <acme@...nel.org>,
"Arnaldo Carvalho de Melo" <acme@...hat.com>,
Jiri Olsa <jolsa@...nel.org>,
Alexey Budankov <alexey.budankov@...ux.intel.com>,
Namhyung Kim <namhyung@...nel.org>, "Tejun Heo" <tj@...nel.org>
Subject: Re: [PATCH v6] perf: Sharing PMU counters across compatible events
> On Nov 6, 2019, at 1:14 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Nov 05, 2019 at 11:06:06PM +0000, Song Liu wrote:
>>
>>
>>> On Nov 5, 2019, at 12:16 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>>>
>>> On Tue, Nov 05, 2019 at 05:11:08PM +0000, Song Liu wrote:
>>>
>>>>> I think we can use one of the event as master. We need to be careful when
>>>>> the master event is removed, but it should be doable. Let me try.
>>>>
>>>> Actually, there is a bigger issue when we use one event as the master: what
>>>> shall we do if the master event is not running? Say it is an cgroup event,
>>>> and the cgroup is not running on this cpu. An extra master (and all these
>>>> array hacks) help us get O(1) complexity in such scenario.
>>>>
>>>> Do you have suggestions on how to solve this problem? Maybe we can keep the
>>>> extra master, and try get rid of the double alloc?
>>>
>>> Right, you have to consider scope when sharing. The master should be the
>>> largest scope event and any slaves should be complete subsets.
>>>
>>> Without much thought this seems a fairly straight forward constraint;
>>> that is, given cgroups I'm not immediately seeing how we can violate
>>> that.
>>>
>>> Basically, pick the cgroup event nearest to the root as the master.
>>> We have to have logic to re-elect the master anyway for deletion, so
>>> changing it on add shouldn't be different.
>>>
>>> (obviously the root-cgroup is cpu-wide and always on, and if you have
>>> two events from disjoint subtrees they have no overlap, so it doesn't
>>> make sense to share anyway)
>>
>> Hmm... I didn't think about cgroup structure with this much detail. And
>> this is very interesting idea.
>>
>> OTOH, non-cgroup event could also be inactive. For example, when we have
>> to rotate events, we may schedule slave before master.
>
> Right, although I suppose in that case you can do what you did in your
> patch here. If someone did IOC_DISABLE on the master, we'd have to
> re-elect a master -- obviously (and IOC_ENABLE).
Re-elect master on IOC_DISABLE is good. But we still need work for ctx
rotation. Otherwise, we need keep the master on at all time.
>
>> And if the master is in an event group, it will be more complicated...
>
> Hurmph, do you actually have that use-case? And yes, this one is tricky.
>
> Would it be sufficient if we disallow group events to be master (but
> allow them to be slaves) ?
Maybe we can solve this with an extra "first_active" pointer in perf_event.
first_active points to the first event that being added by event_pmu_add().
Then we need something like:
event_pmu_add(event)
{
if (event->dup_master->first_active) {
/* sync with first_active */
} else {
/* this event will be the first_active */
event->dup_master->first_active = event;
pmu->add(event);
}
}
However, I just realized the event_pmu_del() path need some more thoughts,
because first_active is likely the first one get sched_out().
Merging another email here:
>> If we do GFP_ATOMIC in perf_event_alloc(), maybe with an extra option, we
>> don't need the tmp_master hack. So we only allocate master when we will
>> use it.
>
> You can't, that's broken on -RT. ctx->lock is a raw_spinlock_t and
> allocator locks are spinlock_t.
How about we add another step in __perf_install_in_context(), like
__perf_install_in_context()
{
bool alloc_master;
perf_ctx_lock();
alloc_master = find_new_sharing(event, ctx);
perf_ctx_unlock();
if (alloc_master)
event->dup_master = perf_event_alloc();
/* existing logic of __perf_install_in_context() */
}
In this way, we only allocate the master event when necessary, and it
is outside of the locks.
Thanks,
Song
Powered by blists - more mailing lists