linux-kernel - Re: [PATCH v3 00/10] Move uid filtering to BPF filters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aD-IJRi0n1WGmOFP@google.com>
Date: Tue, 3 Jun 2025 16:41:25 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Kan Liang <kan.liang@...ux.intel.com>,
	James Clark <james.clark@...aro.org>,
	Dapeng Mi <dapeng1.mi@...ux.intel.com>,
	Thomas Richter <tmricht@...ux.ibm.com>,
	Veronika Molnarova <vmolnaro@...hat.com>, Hao Ge <gehao@...inos.cn>,
	Howard Chu <howardchu95@...il.com>,
	Weilin Wang <weilin.wang@...el.com>, Levi Yun <yeoreum.yun@....com>,
	"Dr. David Alan Gilbert" <linux@...blig.org>,
	Dominique Martinet <asmadeus@...ewreck.org>,
	Xu Yang <xu.yang_2@....com>, Tengda Wu <wutengda@...weicloud.com>,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	bpf@...r.kernel.org
Subject: Re: [PATCH v3 00/10] Move uid filtering to BPF filters

On Tue, Jun 03, 2025 at 04:22:53PM -0700, Ian Rogers wrote:
> On Tue, Jun 3, 2025 at 3:32 PM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > On Mon, Jun 02, 2025 at 11:26:12PM -0700, Ian Rogers wrote:
> > > On Mon, Jun 2, 2025 at 9:41 PM Namhyung Kim <namhyung@...nel.org> wrote:
> > > >
> > > > Hi Ian,
> > > >
> > > > On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> > > > > On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@...gle.com> wrote:
> > > > > >
> > > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > > abort. BPF UID filters are more robust as they avoid the race. The
> > > > > > /proc scanning also misses processes starting after the perf
> > > > > > command. Add a helper for commands that support UID filtering and wire
> > > > > > up. Remove the non-BPF UID filtering support given it doesn't work.
> > > > > >
> > > > > > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> > > > > >     tmp.perf-tools-next.
> > > > > >
> > > > > > v2: Add a perf record uid test (Namhyung) and force setting
> > > > > >     system-wide for perf trace and perf record (Namhyung). Ensure the
> > > > > >     uid filter isn't set on tracepoint evsels.
> > > > > >
> > > > > > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
> > > > >
> > > > > Ping. Thanks,
> > > >
> > > > I'm ok with preferring BPF over /proc scanning, but still hesitate to
> > > > remove it since some people don't use BPF.  Can you please drop that
> > > > part and make parse_uid_filter() conditional on BPF?
> > >
> > > Hi Namhyung,
> > >
> > > The approach of scanning /proc fails as:
> > > 1) processes that start after perf starts will be missed,
> > > 2) processes that terminate between being scanned in /proc and
> > > perf_event_open will cause perf to fail (essentially the -u option is
> > > just sugar to scan /proc and then provide the processes as if they
> > > were a -p option - such an approach doesn't need building into the
> > > tool).
> >
> > Yeah, I remember we had this discussion before.  I think (1) is not true
> > as perf events will be inherited to children (but there is a race).
> 
> If you log in from another terminal? Anything that creates a new
> process for that user but isn't inherited will be missed, which isn't
> merely a race.

As long as the another terminal is owned by the same user, any new
process from the terminal will inherit events, no?

> 
> >  And
> > (2) is a real problem but it's also about a race and it can succeed.
> >
> > Maybe we could change it to skip failed events when the target is a
> > user but that's not the direction you want.
> 
> We could have other events and try to discover new processes via them,
> do things like dummy events to cover races. It is just a lot of
> complexity for something that is a trivial amount of BPF. In something
> like 10 years nobody has bothered to fix this up.

I don't want any complex solution for this.  Let's not touch this.

> 
> > >
> > > This patch series adds a test [1] and perf test has lots of processes
> > > starting and exiting, matching condition (2) above*. If this series
> > > were changed to an approach that uses BPF and falls back on /proc
> > > scanning then the -u option would be broken for both reasons above but
> > > also prove a constant source of test flakes.
> > >
> > > Rather than give the users something both frustrating to use (keeps
> > > quitting due to failed opens) and broken (missing processes) I think
> > > it is better to quit perf at that point informing the user they need
> > > more permissions to load the BPF program. This also makes the -u
> > > option testable.
> > >
> > > So the request for a change I don't think is sensible as it provides a
> > > worse user and testing experience. There is also the cognitive load of
> > > having the /proc scanning code in the code base, whereas the BPF
> > > filter is largely isolated.
> >
> > But I think the problem is that it has different requirements - BPF and
> > root privilege.  So it should be used after checking the requirements
> > and fail or fallback.
> >
> > Does it print proper error messages if not?  With that we can deprecate
> > the existing behavior and remove it later.
> 
> For `perf top` with TUI you get an error message in a box of:
> ```
> failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> (Operation not permitted)
> ```
> With --stdio you get:
> ```
> libbpf: Error in bpf_object__probe_loading(): -EPERM. Couldn't load
> trivial BPF program. Make sure your kernel supports BPF
> (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
> value.
> libbpf: failed to load object 'sample_filter_bpf'
> libbpf: failed to load BPF skeleton 'sample_filter_bpf': -EPERM
> Failed to load perf sample-filter BPF skeleton
> failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> (Operation not permitted)
> ```
> This matches the existing behavior if you put a filter on an event.

But that's different as user directly asked the BPF filter.
The following message would be better (unless you fallback to the old
behavior).

"-u/--uid option is using BPF filter but perf is not built with BPF.
Please make sure to build with libbpf and bpf skeletons."

and/or

"-u/--uid option is using BPF filter which requires root privilege."

You may check if the filter program and map is pinned already.

Thanks,
Namhyung