[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a271c19-9575-24ca-8ebc-9ff5a65bbe3d@linux.intel.com>
Date: Wed, 4 Mar 2020 09:20:42 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, linux-kernel@...r.kernel.org, irogers@...gle.com,
eranian@...gle.com, ak@...ux.intel.com
Subject: Re: [PATCH] perf/core: Fix endless multiplex timer
On 3/4/2020 4:33 AM, Peter Zijlstra wrote:
> On Tue, Mar 03, 2020 at 08:40:10PM -0500, Liang, Kan wrote:
>>> I'm thinking this is wrong.
>>>
>>> That is, yes, this fixes the observed problem, but it also misses at
>>> least one other site. Which seems to suggest we ought to take a
>>> different approach.
>>>
>>> But even with that; I wonder if the actual condition isn't wrong.
>>> Suppose the event was exclusive, and other events weren't scheduled
>>> because of that. Then you disable the one exclusive event _and_ kill
>>> rotation, so then nothing else will ever get on.
>>>
>>> So what I think was supposed to happen is rotation killing itself;
>>> rotation will schedule out the context -- which will clear the flag, and
>>> then schedule the thing back in -- which will set the flag again when
>>> needed.
>>>
>>> Now, that isn't happening, and I think I see why, because when we drop
>>> to !nr_active, we terminate ctx_sched_out() before we get to clearing
>>> the flag, oops!
>>>
>>> So how about something like this?
>>>
>>> ---
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index e453589da97c..7947bd3271a9 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -2182,6 +2182,7 @@ __perf_remove_from_context(struct perf_event *event,
>>> if (!ctx->nr_events && ctx->is_active) {
>>> ctx->is_active = 0;
>>> + ctx->rotate_necessary = 0;
>>> if (ctx->task) {
>>> WARN_ON_ONCE(cpuctx->task_ctx != ctx);
>>> cpuctx->task_ctx = NULL;
>>
>>
>> The patch can fix the observed problem with uncore PMU.
>> But it cannot fix all the cases with core PMU, especially when NMI watchdog
>> is enabled.
>> Because the ctx->nr_events never be 0 with NMI watchdog enabled.
>
> But, I'm confused.. why do we care about nr_events==0 ? The below: vvvv
>
>>> @@ -3074,15 +3075,15 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>>> is_active ^= ctx->is_active; /* changed bits */
>>> - if (!ctx->nr_active || !(is_active & EVENT_ALL))
>>> - return;
>>> -
>>> /*
>>> * If we had been multiplexing, no rotations are necessary, now no events
>>> * are active.
>>> */
>>> ctx->rotate_necessary = 0;
>>> + if (!ctx->nr_active || !(is_active & EVENT_ALL))
>>> + return;
>>> +
>>> perf_pmu_disable(ctx->pmu);
>>> if (is_active & EVENT_PINNED) {
>>> list_for_each_entry_safe(event, tmp, &ctx->pinned_active, active_list)
>
> Makes sure we clear the flag when we ctx_sched_out(), and as long as
> ctx->rotate_necessary is set, perf_rotate_context() will do exactly
> that.
>
NMI watchdog is pinned event.
ctx_event_to_rotate() will only pick an event from the flexible_groups.
So the cpu_ctx_sched_out() in perf_rotate_context() will never be called.
Thanks,
Kan
> Then ctx_sched_in() will re-set the flag if it failed to schedule a
> counter.
>
> So where is that going wrong?
Powered by blists - more mailing lists