[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fXDPor-WsO-TGB3z6QpSWZOFJC31GJpuwZUA3qz_3WsYQ@mail.gmail.com>
Date: Mon, 3 Jun 2024 09:20:40 -0700
From: Ian Rogers <irogers@...gle.com>
To: Yicong Yang <yangyicong@...wei.com>
Cc: will@...nel.org, mark.rutland@....com, acme@...nel.org, 
	namhyung@...nel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	peterz@...radead.org, mingo@...hat.com, alexander.shishkin@...ux.intel.com, 
	jolsa@...nel.org, james.clark@....com, dongli.zhang@...cle.com, 
	jonathan.cameron@...wei.com, prime.zeng@...ilicon.com, linuxarm@...wei.com, 
	yangyicong@...ilicon.com
Subject: Re: [PATCH 2/3] perf: arm_pmu: Only show online CPUs in device's
 "cpus" attribute
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@...wei.com> wrote:
>
> From: Yicong Yang <yangyicong@...ilicon.com>
>
> When there're CPUs offline after system booting, perf will failed:
> [root@...alhost ~]# /home/yang/perf stat -a -e armv8_pmuv3_0/cycles/
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.
Thanks for debugging this Yicong! The fact cycles is falling back on
cpu-clock I'm confused by, on ARM the PMU type generally isn't
PERF_TYPE_HARDWARE and so this fallback shouldn't happen:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n2900
I note your command line is for system wide event opening rather than
for a process. It is more strange the fallback is giving "No such
device".
> This is due to PMU's "cpus" is not updated and still contains offline
> CPUs and perf will try to open perf event on the offlined CPUs.
The perf tool will try to only open online CPUs. Unfortunately the
naming around this is confusing:
- any - an event may follow a task and schedule on "any" CPU. We
encode this with -1.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n24
- online - the set of online CPU.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n33
- all - I try to avoid this but it may still be in the code, "all"
usually is another name for online. Hopefully when we use this name it
is clearly defined:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/internal/evlist.h?h=perf-tools-next#n23
> Make "cpus" attribute only shows online CPUs and introduced a new
> "supported_cpus" where users can get the range of the CPUs this
> PMU supported monitoring.
I don't think this should be necessary as by default the CPUs the perf
tool will use will be the "online" CPUs:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n40
Could you create a reproduction of the problem you are encountering?
My expectation is for a core PMU to advertise the online and offline
CPUs it is valid for, and for perf to intersect:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n44
those CPUs with the online CPUs so the perf_event_open doesn't fail.
Thanks,
Ian
> Signed-off-by: Yicong Yang <yangyicong@...ilicon.com>
> ---
>  drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8458fe2cebb4..acbb0e1d0414 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -558,13 +558,35 @@ static ssize_t cpus_show(struct device *dev,
>                          struct device_attribute *attr, char *buf)
>  {
>         struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
> -       return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> +       cpumask_var_t mask;
> +       ssize_t n;
> +
> +       /* If allocation failed then show the supported_cpus */
> +       if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> +               return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> +
> +       cpumask_and(mask, &armpmu->supported_cpus, cpu_online_mask);
> +       n = cpumap_print_to_pagebuf(true, buf, mask);
> +       free_cpumask_var(mask);
> +
> +       return n;
>  }
>
>  static DEVICE_ATTR_RO(cpus);
>
> +static ssize_t supported_cpus_show(struct device *dev,
> +                                  struct device_attribute *attr, char *buf)
> +{
> +       struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
> +
> +       return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> +}
> +
> +static DEVICE_ATTR_RO(supported_cpus);
> +
>  static struct attribute *armpmu_common_attrs[] = {
>         &dev_attr_cpus.attr,
> +       &dev_attr_supported_cpus.attr,
>         NULL,
>  };
>
> --
> 2.24.0
>
Powered by blists - more mailing lists
 
