[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWE7VhNWm4Vo2EngQ3is7arfMq-m4MJEJZBGjd1p3m5Qg@mail.gmail.com>
Date: Thu, 21 Nov 2024 22:59:45 -0800
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] perf tests: Fix hwmon parsing with PMU name test
On Thu, Nov 21, 2024 at 12:58 PM Arnaldo Carvalho de Melo
<acme@...nel.org> wrote:
>
> On Thu, Nov 21, 2024 at 10:55:00AM -0800, Ian Rogers wrote:
> > On Thu, Nov 21, 2024 at 9:22 AM Arnaldo Carvalho de Melo
> > <acme@...nel.org> wrote:
> > >
> > > On Thu, Nov 21, 2024 at 02:04:45PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Wed, Nov 20, 2024 at 04:09:55PM -0800, Ian Rogers wrote:
> > > > > Incorrectly the hwmon with PMU name test didn't pass "true". Fix and
> > > > > address issue with hwmon_pmu__config_terms needing to load events - a
> > > > > load bearing assert fired. Also fix missing list deletion when putting
> > > > > the hwmon test PMU and lower some debug warnings to make the hwmon PMU
> > > > > less spammy in verbose mode.
> > >
> > > > After applying this, with this series of patches on a Fedora 40 system,
> > > > I get output from -v where before I needed, IIRC, to use -vv:
> > > >
> > > > f8244bb9bfa66f79 (HEAD -> perf-tools-next, x1/perf-tools-next) perf tests: Fix hwmon parsing with PMU name test
> > > > 9ae6c7a4bd02acbd perf hwmon_pmu: Ensure hwmon key union is zeroed before use
> > > > 3e37de098af38179 perf tests hwmon_pmu: Remove double evlist__delete()
> > > > 0565017a0ac824c2 perf test: Correct hwmon test PMU detection
> > > > 85c60a01b85ee956 (perf-tools-next/tmp.perf-tools-next, perf-tools-next/perf-tools-next) perf: Remove unused del_perf_probe_events()
> > > > ⬢ [acme@...lbox perf-tools-next]$
> > >
> > > <SNIP>
> > >
> > > > root@...ber:~# perf test -v 11
> > > > 11: Hwmon PMU :
> > > > 11.1: Basic parsing test : Ok
> > > > --- start ---
> > > > test child forked, pid 1823259
> > > > Testing 'temp_test_hwmon_event1'
> > > > Using CPUID GenuineIntel-6-B7-1
> > > > FAILED tests/hwmon_pmu.c:159 failed to parse event 'temp_test_hwmon_event1', err 1
> > > > event syntax error: 'temp_test_hwmon_event1'
> > > > \___ Bad event name
> > > >
> > > > Unable to find event on a PMU of 'temp_test_hwmon_event1'
> > >
> > > In gdb it fails on the first call to do_test() from the first test__hwmon_pmu()
> > >
> > > Starting program: /root/bin/perf test -F -vv 11
> > > [Thread debugging using libthread_db enabled]
> > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > --- start ---
> > > ---- end ----
> > > 11.1: Basic parsing test : Ok
> > > --- start ---
> > >
> > > Breakpoint 1, test__hwmon_pmu (with_pmu=false) at tests/hwmon_pmu.c:203
> > > 203 {
> > > (gdb) n
> > > 205 struct perf_pmu *pmu = test_pmu_get(dir, sizeof(dir));
> > > (gdb) n
> > > 206 int ret = TEST_OK;
> > > (gdb) p pmu
> > > $2 = (struct perf_pmu *) 0xf50450
> > > (gdb) p *pmu
> > > $3 = {name = 0xf50ac0 "hwmon_a_test_hwmon_pmu", alias_name = 0xf50aa0 "hwmon1234", id = 0x0, type = 4294902994, selectable = false, is_core = false, is_uncore = false, auxtrace = false,
> > > formats_checked = false, config_masks_present = false, config_masks_computed = false, max_precise = 0, perf_event_attr_init_default = 0x0, cpus = 0xf4fbf0, format = {next = 0xf50488,
> > > prev = 0xf50488}, aliases = {next = 0xf50498, prev = 0xf50498}, events_table = 0x0, sysfs_aliases = 0, cpu_json_aliases = 0, sys_json_aliases = 0, sysfs_aliases_loaded = false,
> > > cpu_aliases_added = false, caps_initialized = false, nr_caps = 0, caps = {next = 0xf504c8, prev = 0xf504c8}, list = {next = 0xedc090 <other_pmus>, prev = 0xedc090 <other_pmus>},
> > > config_masks = {0, 0, 0, 0}, missing_features = {exclude_guest = false, checked = false}, mem_events = 0x0}
> >
> > Thanks for helping, I'm not able to repro this, so extra debugging
> > would be useful for me. Here sysfs_aliases_loaded is false as we'll
> > load the PMU aliases when there is a request to pmu__have_event. This
> > looks pretty ordinary.
> >
> > > (gdb) s
> > > 208 if (!pmu)
> > > (gdb) s
> > > 211 for (size_t i = 0; i < ARRAY_SIZE(test_events); i++) {
> > > (gdb) s
> > > 212 ret = do_test(i, with_pmu, /*with_alias=*/false);
> > > (gdb) s
> > > do_test (i=0, with_pmu=false, with_alias=false) at tests/hwmon_pmu.c:136
> > > 136 {
> > > (gdb) n
> > > 137 const char *test_event = with_alias ? test_events[i].alias : test_events[i].name;
> > > (gdb) n
> > > 138 struct evlist *evlist = evlist__new();
> > > (gdb) n
> > > 143 bool found = false;
> > > (gdb) n
> > > 145 if (!evlist) {
> > > (gdb) n
> > > 150 if (with_pmu)
> > > (gdb) n
> > > 153 strlcpy(str, test_event, sizeof(str));
> > > (gdb) n
> > > 155 pr_debug("Testing '%s'\n", str);
> > > (gdb) p str
> > > $4 = "temp_test_hwmon_event1\000\000\004\000\000\000\000\000\000\000\274\204z\000\000\000\000\000΄z\000\000\000\000\000\021\000\000\000\000\000\000\000 \305\377\377\377\177\000\000߄z\000\000\000\000\000\353\204z\000\000\000\000\000\376\204z\000\000\000\000\000\n\205z\000\000\000\000\000\021\205z\000\000\000\000\000\035\205z\000\000\000\000\0000\205z\000\000\000\000\000<\205z\000\000\000\000"
> > > (gdb) n
> > > Testing 'temp_test_hwmon_event1'
> > > 156 parse_events_error__init(&err);
> >
> > So there was no parse event output like the expected:
> > ```
> > Attempt to add: hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/
> > ..after resolving event: hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/
> > ```
> > The wildcard PMU lookup will call perf_pmu__have_event trying to find
> > a PMU with the event:
> > ```
> > Breakpoint 1, perf_pmu__have_event (pmu=0x555556157f90,
> > name=0x5555560ce470 "temp_test_hwmon_event1") at util/pmu.c:1816
> > 1816 {
> > (gdb) bt
> > #0 perf_pmu__have_event (pmu=0x555556157f90, name=0x5555560ce470
> > "temp_test_hwmon_event1")
> > at util/pmu.c:1816
> > #1 0x00005555557ab143 in parse_events_multi_pmu_add
> > (parse_state=0x7fffffffbf00,
> > event_name=0x5555560ce470 "temp_test_hwmon_event1", hw_config=10,
> > const_parsed_terms=0x0,
> > listp=0x7fffffffa270, loc_=0x7fffffffb120) at util/parse-events.c:1592
> > #2 0x00005555558108cc in parse_events_parse
> > (_parse_state=0x7fffffffbf00, scanner=0x555556138aa0)
> > at util/parse-events.y:293
> > #3 0x00005555557abdc2 in parse_events__scanner (str=0x7fffffffc000
> > "temp_test_hwmon_event1",
> > input=0x0, parse_state=0x7fffffffbf00) at util/parse-events.c:1870
> > #4 0x00005555557ac735 in __parse_events (evlist=0x55555613ca40,
> > str=0x7fffffffc000 "temp_test_hwmon_event1", pmu_filter=0x0,
> > err=0x7fffffffbff0,
> > fake_pmu=false, warn_if_reordered=true, fake_tp=false) at
> > util/parse-events.c:2139
> > #5 0x00005555557448ca in parse_events (evlist=0x55555613ca40,
> > str=0x7fffffffc000 "temp_test_hwmon_event1", err=0x7fffffffbff0)
> > at /home/irogers/kernel.org2/tools/perf/util/parse-events.h:41
> > #6 0x0000555555744f6e in do_test (i=0, with_pmu=false,
> > with_alias=false) at tests/hwmon_pmu.c:156
> > #7 0x00005555557452dd in test__hwmon_pmu (with_pmu=false) at
> > tests/hwmon_pmu.c:212
> > #8 0x000055555574538d in test__hwmon_pmu_without_pmu
> > (test=0x5555560a3740 <suite.hwmon_pmu>,
> > subtest=1) at tests/hwmon_pmu.c:229
> > #9 0x00005555556fc935 in start_test (test=0x5555560a3740
> > <suite.hwmon_pmu>, i=10, subi=1,
> > child=0x55555613c528, width=64, pass=1) at tests/builtin-test.c:424
> > #10 0x00005555556fd014 in __cmd_test (suites=0x55555613c0f0, argc=1,
> > argv=0x7fffffffd9c0,
> > skiplist=0x0) at tests/builtin-test.c:571
> > #11 0x00005555556fdb29 in cmd_test (argc=1, argv=0x7fffffffd9c0) at
> > tests/builtin-test.c:773
> > #12 0x000055555568043a in run_builtin (p=0x55555608f950 <commands+624>, argc=4,
> > argv=0x7fffffffd9c0) at perf.c:351
> > #13 0x00005555556806e1 in handle_internal_command (argc=4,
> > argv=0x7fffffffd9c0) at perf.c:404
> > #14 0x000055555568083a in run_argv (argcp=0x7fffffffd7bc,
> > argv=0x7fffffffd7b0) at perf.c:448
> > #15 0x0000555555680b83 in main (argc=4, argv=0x7fffffffd9c0) at perf.c:560
> > (gdb) p pmu->name
> > $1 = 0x5555560ce940 "cpu"
> > ```
> > Repeating this the test hwmon_a_test_hwmon_pmu test PMU should be tested:
> > ```
> > (gdb) c
> > Continuing.
> >
> > (gdb) p pmu->name
> > $2 = 0x5555560d6b20 "breakpoint"
> > (gdb) c
> > Continuing.
> >
> > Breakpoint 1, perf_pmu__have_event (pmu=0x555556158060,
> > name=0x5555560ce470 "temp_test_hwmon_event1") at util/pmu.c:1816
> > 1816 {
> > (gdb) p pmu->name
> > $3 = 0x5555560d7ec0 "cstate_core"
> > (gdb) c
> > Continuing.
> >
> > Breakpoint 1, perf_pmu__have_event (pmu=0x555556158610,
> > name=0x5555560ce470 "temp_test_hwmon_event1") at util/pmu.c:1816
> > 1816 {
> > (gdb) p pmu->name
> > $4 = 0x5555560c84b0 "cstate_pkg"
> > (gdb) c
> > Continuing.
> >
> > Breakpoint 1, perf_pmu__have_event (pmu=0x5555561360b0,
> > name=0x5555560ce470 "temp_test_hwmon_event1") at util/pmu.c:1816
> > 1816 {
> > (gdb) p pmu->name
> > $5 = 0x5555560d46a0 "hwmon_a_test_hwmon_pmu"
> > ```
> > which should then go into the hwmon_pmu__have_event:
> > ```
> > (gdb) n
> > 1817 if (!name)
> > (gdb)
> > 1819 if (perf_pmu__is_tool(pmu) && tool_pmu__skip_event(name))
> > (gdb)
> > 1821 if (perf_pmu__is_hwmon(pmu))
> > (gdb) n
> > 1822 return hwmon_pmu__have_event(pmu, name);
> > (gdb) s
> > hwmon_pmu__have_event (pmu=0x5555561360b0, name=0x5555560ce470
> > "temp_test_hwmon_event1")
> > at util/hwmon_pmu.c:559
> > ```
> > hwmon_pmu__have_event should return true but it is either not getting
> > called in your case or it is returning false. Not getting called I
> > find hard to understand as your output shows the test PMU was created.
> > It seems more likely reading the "events" and then doing the hashmap
> > lookup in hwmon_pmu__have_event fails. This was failing for me with
> > undefined behavior sanitizer because of the under initialized unions.
> > But that should be fixed by: "perf hwmon_pmu: Ensure hwmon key union
> > is zeroed before use". The particular event "temp_test_hwmon_event1"
> > is going to search all events as it uses the "label" name, so that
> > makes me think the bug is here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/hwmon_pmu.c?h=perf-tools-next#n584
> > but I can't eye ball an issue and the test works for me even when
> > trying to be aggressive with sanitizers. If you could help me look I'd
> > appreciate it.
>
> Where is that "temp_test_hwmon_event1" added?
test_pmu_get creates the directory for the test PMU. The file
temp1_label will be created with the contents "test hwmon event1",
there is also a fake "input" file:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/hwmon_pmu.c?h=perf-tools-next#n62
```
{ "temp1_label", "test hwmon event1\n", },
{ "temp1_input", "40000\n", },
```
So we will have an event called temp1 based on the type and number.
Because we have a label there will be an alias of
"temp_test_hwmon_event1" where temp is the type from the filename,
"test_hwmon_event1" is the contents of the file "fixed" by removing
escapes, spaces, etc. and making them underscores:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/hwmon_pmu.c?h=perf-tools-next#n234
These files are read and stored in a hashmap where the key is the
type, so temp, and the number, so 1:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/hwmon_pmu.c?h=perf-tools-next#n287
The value associated with the key is just a bitmap saying what kinds
of hwmon items (files) there are for this type+number as well as a
cache of reading the file event's label file and of its "fixed" name
to aid searching:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/hwmon_pmu.c?h=perf-tools-next#n125
The key is numeric but the failing event is a string and requires the
whole map searching to check for presence by matching what is the name
computed from the label file:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/hwmon_pmu.c?h=perf-tools-next#n579
```
Breakpoint 1, hwmon_pmu__have_event (pmu=0x555556147f40,
name=0x5555560d8470 "temp_test_hwmon_event1") at util/hwmon_pmu.c:559
559 {
(gdb) n
560 struct hwmon_pmu *hwm = container_of(pmu, struct
hwmon_pmu, pmu);
(gdb)
563 union hwmon_pmu_event_key key = {};
(gdb)
567 if (!parse_hwmon_filename(name, &type, &number,
/*item=*/NULL, /*is_alarm=*/NULL))
(gdb)
570 if (hwmon_pmu__read_events(hwm))
(gdb)
573 key.type = type;
(gdb)
574 key.num = number;
(gdb)
575 if (hashmap_find(&hwm->events, key.type_and_num,
/*value=*/NULL))
(gdb)
577 if (key.num != -1)
(gdb)
580 hashmap__for_each_entry((&hwm->events), cur, bkt) {
(gdb) n
581 struct hwmon_pmu_event_value *value = cur->pvalue;
(gdb)
583 key.type_and_num = cur->key;
(gdb)
584 if (key.type == type && value->name &&
!strcasecmp(name, value->name))
(gdb) p value->name
$1 = 0x5555560d0680 "temp_test_hwmon_event1"
(gdb) n
585 return true;
```
The if here must be failing for you, but I fail to see why. I don't
see the strcasecmp in your output and what we care about is why either
the type is wrong or value->name is wrong, if you could help to log
these it would be great.
Thanks,
Ian
Powered by blists - more mailing lists