[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zwmk3iqTdTkStfqx@orsosgc001>
Date: Fri, 11 Oct 2024 15:21:18 -0700
From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@...el.com>
To: Lucas De Marchi <lucas.demarchi@...el.com>
CC: <linux-kernel@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>, "Peter
Zijlstra" <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, "Arnaldo
Carvalho de Melo" <acme@...nel.org>, Ian Rogers <irogers@...gle.com>, "Tvrtko
Ursulin" <tvrtko.ursulin@...lia.com>
Subject: Re: [PATCH 0/5] perf: Fix pmu for drivers with bind/unbind
On Tue, Oct 08, 2024 at 01:34:56PM -0500, Lucas De Marchi wrote:
>v2 of my attempt at fixing how i915 interacts with perf events.
>
>v1 - https://lore.kernel.org/all/20240722210648.80892-1-lucas.demarchi@intel.com/
>
>>From other people:
>1) https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/
>2) https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/
>
>WARNING: patches 1, 4 and 5 are NOT intended to be applied as is. More
>on this below.
>
>This series basically builds on the idea of the first patch of my
>previous series, but extends it in a way that
>
> a) the other patches are not needed (at least, not as is) and
> b) driver can rebind just fine - no issues with the new call to
> perf_pmu_register()
I have 2 broad questions:
(1) I am curious how (b) works. You seem to have a notion of instances of
devices and then are you using the instance number to create the name
used for the sysfs entry? Did I get that right?
If so, should the application discover what the name is each time it is
run? In the failure case that I am seeing, I am running an application
that does not work when I rename the sysfs entry to something else.
(2) Similar to Patch 1 of your v1 series where you modified _free_event:
static void _free_event(struct perf_event *event)
{
struct module *module;
...
module = event->pmu->module;
...
if (event->destroy)
event->destroy(event);
...
module_put(module);
...
}
With the above code, the kref to i915->pmu can be taken from the below
points in i915 code (just to point out the sequence):
i915_pmu_register()
{
perf_pmu_register()
drm_dev_get()
kref_init()
}
i915_pmu_unregister()
{
kref_put(&ref, pmu_cleanup)
}
i915_pmu_event_init()
{
kref_get()
}
i915_pmu_event_destroy()
{
kref_put(&ref, pmu_cleanup)
}
void pmu_cleanup(struct kref *ref)
{
i915_pmu_unregister_cpuhp_state(pmu);
perf_pmu_unregister(&pmu->base);
pmu->base.event_init = NULL;
kfree(pmu->base.attr_groups);
if (!is_igp(i915))
kfree(pmu->name);
free_event_attributes(pmu);
drm_dev_put(&i915->drm);
}
Would this work? Do you see any gaps that may need the ref counting code
you added in perf?
Thanks,
Umesh
>
>Another difference is that rather than mixing i915 cleanups this just
>adds a dummy pmu with no backing HW. Intention for dummy_pmu is for
>experimental purpose and to demonstrate changes tha need to be applied
>to i915 (and probably amdgpu, and also in the pending xe patch).
>If desired to have an example like that in tree, then we should hide it
>behind a config option and I'd need to re-check the error handling.
>
>With this set I could run the following test script multiple times with
>no issues observed:
>
> #!/bin/bash
>
> set -e
>
> rand_sleep() {
> sleep $(bc -l <<< "$(shuf -i 0-3000 -n 1) / 1000")
> }
>
> test_noperf() {
> echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
> echo 0 > /sys/kernel/debug/dummy_pmu/unbind
> }
>
> test_perf_before() {
> echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
> perf stat --interval-count 2 -e dummy_pmu_0/test-event-1/ -I500
> echo 0 > /sys/kernel/debug/dummy_pmu/unbind
> }
>
> test_kill_perf_later() {
> echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
> perf stat -e dummy_pmu_0/test-event-1/ -I500 &
> pid=$!
> rand_sleep
> echo 0 > /sys/kernel/debug/dummy_pmu/unbind
> rand_sleep
> kill $pid
> }
>
> test_kill_perf_laaaaaaater() {
> echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
> perf stat -e dummy_pmu_0/test-event-1/ -I500 &
> pid=$!
> rand_sleep
> echo 0 > /sys/kernel/debug/dummy_pmu/unbind
> rand_sleep
> rand_sleep
> rand_sleep
> kill $pid
> }
>
> test_kill_perf_with_leader() {
> echo 0 > /sys/kernel/debug/dummy_pmu/bind
>
> perf stat -e '{dummy_pmu_0/test-event-1/,dummy_pmu_0/test-event-2/}:S' -I500 &
> pid=$!
> rand_sleep
> echo 0 > /sys/kernel/debug/dummy_pmu/unbind
> rand_sleep
> rand_sleep
> kill $pid
> }
>
> N=${1:-1}
>
> for ((i=0; i<N; i++)); do
> printf "%04u/%04u\n" "$((i+1))" "$N" >&2
> test_noperf
> test_perf_before
> test_kill_perf_later
> test_kill_perf_laaaaaaater
> test_kill_perf_with_leader
> echo >&2
> done
>
>Last patch is optional for a driver and not needed for the fix.
>
>Open topics:
>
> - If something like the last patch is desirable, should it be
> done from inside perf_pmu_unregister()?
>
> - Should we have a dummy_pmu (or whatever name) behind a config,
> or maybe in Documentation/ ?
>
>thanks,
>Lucas De Marchi
>
>Lucas De Marchi (5):
> perf: Add dummy pmu module
> perf: Move free outside of the mutex
> perf: Add pmu get/put
> perf/dummy_pmu: Tie pmu to device lifecycle
> perf/dummy_pmu: Track and disable active events
>
> include/linux/perf_event.h | 12 +
> kernel/events/Makefile | 1 +
> kernel/events/core.c | 39 ++-
> kernel/events/dummy_pmu.c | 492 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 539 insertions(+), 5 deletions(-)
> create mode 100644 kernel/events/dummy_pmu.c
>
>--
>2.46.2
>
Powered by blists - more mailing lists