linux-kernel - Re: [PATCH v1 2/2] perf stat: Refactor retry/skip/fatal error handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAP-5=fWvj-cAzh9R=8EjjUaaLGe9+XW5x_f4Pka6EdAQMU1X4A@mail.gmail.com>
Date: Thu, 2 Oct 2025 13:46:54 -0700
From: Ian Rogers <irogers@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>, 
	Kan Liang <kan.liang@...ux.intel.com>, James Clark <james.clark@...aro.org>, 
	Howard Chu <howardchu95@...il.com>, Thomas Falcon <thomas.falcon@...el.com>, 
	Chun-Tse Shao <ctshao@...gle.com>, Dapeng Mi <dapeng1.mi@...ux.intel.com>, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/2] perf stat: Refactor retry/skip/fatal error handling

On Thu, Oct 2, 2025 at 1:06 PM Ian Rogers <irogers@...gle.com> wrote:
>
> For the sake of Intel topdown events commit 9eac5612da1c ("perf stat:
> Don't skip failing group events") changed perf stat error handling
> making it so that more errors were fatal and didn't report "<not
> supported>" events. The change outside of topdown events was
> unintentional.
>
> The notion of "fatal" error handling was introduced in commit
> e0e6a6ca3ac2 ("perf stat: Factor out open error handling") and refined
> in commits like commit cb5ef60067c1 ("perf stat: Error out unsupported
> group leader immediately) to be an approach for avoiding later
> assertion failures in the code base. This change fixes those issues
> and removes the notion of a fatal error on an event. If all events
> fail to open then a fatal error occurs with the previous fatal error
> message. This seems to best match the notion of supported events and
> allowing some errors not to stop perf stat, while allowing the truly
> fatal no event case to terminate the tool early.
>
> The evsel->errored flag is only used in the stat code but always just
> meaning !evsel->supported although there is a comment about it being
> sticky. Force all evsels to be supported in evsel__init and then clear
> this when evsel__open fails. When an event is tried the supported is
> set to true again. This simplifies the notion of whether an evsel is
> broken.
>
> In the get_group_fd code, fail to get a group fd when the evsel isn't
> supported. If the leader isn't supported then it is also expected that
> there is no group_fd as the leader will have been skipped. Therefore
> change the BUG_ON test to be on supported rather than skippable. This
> corrects the assertion errors that were the reason for the previous
> fatal error handling.
>
> Fixes: 9eac5612da1c ("perf stat: Don't skip failing group events")
> Signed-off-by: Ian Rogers <irogers@...gle.com>
> ---
> An earlier version of this fix exists in:
> https://lore.kernel.org/lkml/20250923223312.238185-2-irogers@google.com/
> This version is more thorough and tries to make the overall code base
> more consistent.

Looks like this broke some test expectations, will fix up in v2.

Thanks,
Ian