[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <28569544-204a-45c3-a907-a0b3ae76812c@linux.intel.com>
Date: Tue, 13 May 2025 10:47:48 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, namhyung@...nel.org, irogers@...gle.com,
mark.rutland@....com, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, eranian@...gle.com, ctshao@...gle.com,
tmricht@...ux.ibm.com
Subject: Re: [RFC PATCH 01/15] perf: Fix the throttle logic for a group
On 2025-05-13 5:41 a.m., Peter Zijlstra wrote:
> On Tue, May 06, 2025 at 09:47:26AM -0700, kan.liang@...ux.intel.com wrote:
>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index a84abc2b7f20..eb0dc871f4f1 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -2734,6 +2734,38 @@ void perf_event_disable_inatomic(struct perf_event *event)
>> static void perf_log_throttle(struct perf_event *event, int enable);
>> static void perf_log_itrace_start(struct perf_event *event);
>>
>> +static void perf_event_group_unthrottle(struct perf_event *event, bool start_event)
>> +{
>> + struct perf_event *leader = event->group_leader;
>> + struct perf_event *sibling;
>> +
>> + if (leader != event || start_event)
>> + leader->pmu->start(leader, 0);
>> + leader->hw.interrupts = 0;
>> +
>> + for_each_sibling_event(sibling, leader) {
>> + if (sibling != event || start_event)
>> + sibling->pmu->start(sibling, 0);
>> + sibling->hw.interrupts = 0;
>> + }
>> +
>> + perf_log_throttle(leader, 1);
>> +}
>> +
>> +static void perf_event_group_throttle(struct perf_event *event)
>> +{
>> + struct perf_event *leader = event->group_leader;
>> + struct perf_event *sibling;
>> +
>> + leader->hw.interrupts = MAX_INTERRUPTS;
>> + leader->pmu->stop(leader, 0);
>> +
>> + for_each_sibling_event(sibling, leader)
>> + sibling->pmu->stop(sibling, 0);
>> +
>> + perf_log_throttle(leader, 0);
>> +}
>
> Urgh, this seems inconsistent at best.
>
> - unthrottle will set interrupts to 0 for the whole group
> - throttle will set interrupts for leader
> while the old code would set interrupts for event.
> - interrupts was written with event stopped, while now you consistently
> write when started
> - both now issue perf_log_throttle() on leader, whereas previously it
> was issued on event.
>
> IOW
>
> before: after:
> event stops leader MAX_INTERRUPTS
> event MAX_INTERRUPTS event group stops
> event logs leader logs
>
> event 0 event group 0
> event starts event group starts
> event logs leader logs
>
> Like said, a rather inconsistent and random collection of things.
>
>
>
> What's wrong with something simple like:
>
> static void perf_event_throttle(struct perf_event *event, bool start)
> {
> event->hw.interrupts = 0;
> if (start)
> event->pmu->start(event, 0);
> perf_log_throttle(event, 1);
> }
>
> static void perf_event_unthrottle(struct perf_event *event)
> {
> event->pmu->stop(event, 0);
> event->hw.interrupts = MAX_INTERRUPTS;
> perf_log_throttle(event, 0);
> }
I think the name is reversed. An event/group should be stopped when
throttle.
>
> static void perf_event_throttle_group(struct perf_event *event, bool start)
> {
> struct perf_event *sibling, *leader = event->group_leader;
>
> perf_event_throttle(leader, start);
> for_each_sibling_event(sibling, leader)
> perf_event_throttle(sibling, start);
> }
>
> static void perf_event_unthrottle_group(struct perf_event *event)
> {
> struct perf_event *sibling, *leader = event->group_leader;
>
> perf_event_unthrottle(leader);
> for_each_sibling_event(sibling, leader)
> perf_event_unthrottle(sibling);
> }
>
> That way:
>
> before: after:
> event stops event group stops
> event MAX_INTERRUPTS event group MAX_INTERRUPTS
> event logs event group logs
>
> event 0 event group 0
> event starts event group starts
> event logs event group logs
>
> All that was done before is still done - no change in behaviour for
> event. Its just that the rest of the group is now taken along for the
> ride.
Except the naming, the change looks good to me.
I will update it in V2. Thanks.
>
>> @@ -6421,14 +6451,6 @@ static void __perf_event_period(struct perf_event *event,
>> active = (event->state == PERF_EVENT_STATE_ACTIVE);
>> if (active) {
>> perf_pmu_disable(event->pmu);
>> - /*
>> - * We could be throttled; unthrottle now to avoid the tick
>> - * trying to unthrottle while we already re-started the event.
>> - */
>> - if (event->hw.interrupts == MAX_INTERRUPTS) {
>> - event->hw.interrupts = 0;
>> - perf_log_throttle(event, 1);
>> - }
>> event->pmu->stop(event, PERF_EF_UPDATE);
>> }
>>
>> @@ -6436,6 +6458,12 @@ static void __perf_event_period(struct perf_event *event,
>>
>> if (active) {
>> event->pmu->start(event, PERF_EF_RELOAD);
>> + /*
>> + * We could be throttled; unthrottle now to avoid the tick
>> + * trying to unthrottle while we already re-started the event.
>> + */
>> + if (event->group_leader->hw.interrupts == MAX_INTERRUPTS)
>> + perf_event_group_unthrottle(event, false);
>> perf_pmu_enable(event->pmu);
>> }
>> }
>
> This change seems random. Also, note that I'm kicking myself for the
> total lack of useful information in commit 1e02cd40f151.
>
> Notably, we're calling this from event_function_call(), this means we're
> having IRQs disabled and are running on the events CPU. How can we
> interact with the tick?
The commit bad7192b842c indicates that an event should be start
immediately once the period is force-reset. If perf does so for a
throttled event, the MAX_INTERRUPTS must be cleared. Otherwise, in the
next tick, the start will be invoked. At least, for x86, there will be a
WARN when perf starts an non-stop event.
I moved the code down a little bit so the events in a group could start
all together. But I think it's possible that the member event is
force-reset. So the start of the leader may not be invoked first. But it
should be OK, since the PMU is disabled.
Thanks,
Kan
Powered by blists - more mailing lists