[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <721f055b-d594-4b24-b9e4-6d56475927fd@amd.com>
Date: Tue, 27 Aug 2024 10:05:57 +0530
From: Ravi Bangoria <ravi.bangoria@....com>
To: Namhyung Kim <namhyung@...nel.org>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>
Cc: Kan Liang <kan.liang@...ux.intel.com>, Mark Rutland
<mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, Stephane Eranian <eranian@...gle.com>,
Sandipan Das <sandipan.das@....com>, Ananth Narayan
<ananth.narayan@....com>, Ravi Bangoria <ravi.bangoria@....com>
Subject: Re: [RFC/PATCH] perf/x86: Relax privilege filter restriction on AMD
IBS
Hi Namhyung,
On 23-Aug-24 4:38 AM, Namhyung Kim wrote:
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples. That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> we want and I've been getting requests to fix this issue.
>
> This should be done in the hardware, but until we get the fix we may
> allow exclude_{kernel,user} in the attribute and silently drop the
> samples in the PMU IRQ handler. It won't guarantee the sampling
> frequency or even it'd miss some with fixed period too. Not ideal,
> but that'd still be helpful to regular users.
>
> I also think that it should be able to inform the number of dropped
> samples to userspace so I've increased the lost_samples count. This
> can be read with the PERF_FORMAT_LOST (perf tools set it by default)
> so I didn't emit the PERF_RECORD_LOST_SAMPLES for this.
Samples are discarded not lost. Should we count them as lost?
> I'm not sure if it's acceptable since it might be confusing to figure
> out whether it's because of a lack of the ring buffer or it's dropped
> due to privileges. Or we can add a new record format for this. Let me
> know if you have a better idea.
>
> Cc: Ravi Bangoria <ravi.bangoria@....com>
> Cc: Stephane Eranian <eranian@...gle.com>
> Signed-off-by: Namhyung Kim <namhyung@...nel.org>
> ---
> arch/x86/events/amd/ibs.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index e91970b01d62..6f514072c440 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
> if (has_branch_stack(event))
> return -EOPNOTSUPP;
>
> + /* handle exclude_{user,kernel} in the IRQ handler */
> + if (event->attr.exclude_hv || event->attr.exclude_idle ||
> + event->attr.exclude_host || event->attr.exclude_guest)
> + return -EINVAL;
> +
> ret = validate_group(event);
> if (ret)
> return ret;
> @@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
> .start = perf_ibs_start,
> .stop = perf_ibs_stop,
> .read = perf_ibs_read,
> - .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
> },
> .msr = MSR_AMD64_IBSFETCHCTL,
> .config_mask = IBS_FETCH_CONFIG_MASK,
> @@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
> .start = perf_ibs_start,
> .stop = perf_ibs_stop,
> .read = perf_ibs_read,
> - .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
> },
> .msr = MSR_AMD64_IBSOPCTL,
> .config_mask = IBS_OP_CONFIG_MASK,
> @@ -1062,6 +1065,13 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> if (!perf_ibs_set_period(perf_ibs, hwc, &period))
> goto out; /* no sw counter overflow */
>
> + if ((event->attr.exclude_kernel && !user_mode(iregs)) ||
> + (event->attr.exclude_user && user_mode(iregs))) {
Should we use kernel_ip() instead? That would be accurate since RIP is
provided by IBS hw itself.
user_mode() relies on CS which is captured at interrupt time, not at the
sample capture time, and processor might have switched privilege mode by
the time IBS interrupt arrives. We might need to fallback to user_mode()
if ibs_op_data->op_rip_invalid is set.
Wondering, should perf_misc_flags() also switch to kernel_ip() ?
> + throttle = perf_event_account_interrupt(event);
> + atomic64_inc(&event->lost_samples);
> + goto out;
> + }
> +
> ibs_data.caps = ibs_caps;
> size = 1;
> offset = 1;
Thanks,
Ravi
Powered by blists - more mailing lists