[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fXjmJ+Rr7ejL6fCMeu6PZSit7n84hQkjh=0jQhkr1on3Q@mail.gmail.com>
Date: Mon, 10 Feb 2025 20:40:01 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>,
Hao Ge <gehao@...inos.cn>, James Clark <james.clark@...aro.org>,
Howard Chu <howardchu95@...il.com>, Dominique Martinet <asmadeus@...ewreck.org>,
Levi Yun <yeoreum.yun@....com>, Xu Yang <xu.yang_2@....com>,
Tengda Wu <wutengda@...weicloud.com>, Yang Jihong <yangjihong1@...wei.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 00/10] Move uid filtering to BPF filters
On Mon, Feb 10, 2025 at 7:12 PM Namhyung Kim <namhyung@...nel.org> wrote:
>
> On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> > On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@...nel.org> wrote:
> > >
> > > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > > helper for commands that support UID filtering and wire up. Remove the
> > > > non-BPF UID filtering support.
> > >
> > > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > > Also non-root users will be limited unless it pinned the BPF program in
> > > advance.
> > >
> > > I think you can keep the original behavior and convert to BPF only when
> > > it's available.
> >
> > Using BPF when available would be limited progress. The issues I have
> > with not removing the existing approach are:
> >
> > 1) It is broken
> > Scanning /proc for pids and then doing perf_event_open means that any
> > time a process exits the perf_event_open fails.
> > Steps to reproduce:
> > This bug reproduces easily but if your machine is lightly loaded in
> > one terminal run `perf test`, in another terminal run `sudo perf top
> > -u $(id -u)` the perf top command will exit with:
> > ```
> > Error:
> > The sys_perf_event_open() syscall returned with 3 (No such process)
> > for event (cycles:P).
> > /bin/dmesg | grep -i perf may provide additional information.
> > ```
> >
> > 2) It is a work in progress that isn't progressing. Scanning /proc
> > will only tell you about started processes; it won't tell you about
> > processes that start during the profiling run, whether it be perf top
> > or perf record. Extra work would be necessary to make this most basic
> > of use-cases work, perhaps you could use tracepoints to capture
> > starting processes and then enable user profiling on those processes.
> > It would be horribly complicated, suffer from delays between observing
> > things happen and doing the perf_event_open, biases in the samples,
> > etc. I don't expect anyone to do it, especially when BPF filtering
> > already solves the problem better. There have been 13 years that
> > someone could have fixed it.
> >
> > 3) it adds significant useless complexity to the code base. Having
> > 'user' in the target makes it look like perf_event_open can work on a
> > pid, system wide or user basis. The user basis doesn't exist so the
> > majority of the code base is just ignoring it - search for users of
> > uid_str on target. Those that do run into problems (1) and (2).
> >
> > 4) It is redundant and leads to confusion with BPF filtering. Having
> > the BPF filter on evsels is non-zero cost in terms of the code base
> > complexity, but it is something broadly useful. As user filtering has
> > never worked (see above) it isn't broadly used but is adding
> > complexity. If both approaches were wanted then it is unclear what the
> > code should be doing, presumably the mish-mash of BPF filtering and
> > /proc scanning that happens today and will be broken due to (1) and
> > (2). Putting everything into the BPF filter makes sense as you can
> > combine a BPF filter with an additional BPF filter on user.
> >
> > 5) It is untested and adding a test leads to an always broken test due
> > to testing being an example of where things break, see point 1 and its
> > example.
> >
> > 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> > previous feature, maintaining it for no users is overhead. Let me know
> > if someone is using this option (I doubt it given points 1 and 2) and
> > they wouldn't be better served by BPF. People building perf today have
> > to explicitly opt-out of wanting BPF in their tooling. I posted this
> > change a month ago and nobody has jumped up saying please don't remove
> > the old approach.
> >
> > 7) The interaction with this feature and changes in behavior, say
> > ignoring events that fail to open, is non-obvious and not testable as
> > testing would be broken as the functionality itself is broken. Having
> > the broken feature hanging around and being untestable means that we
> > slow progress on new features, testing and other improvements.
> >
> > Yes, we could try to fix all of that but why? Nobody has cared or
> > tried in 13 years and I would like the tech debt off our plate with a
> > better approach in its place.
>
> Thanks for writing this up. I agree BPF approach is better but it has
> its own limitation - basically it requires root. And I know a few of
> people who don't use BPF. :) And maybe we need to check if user passes
> filters to the event already.
I thought you were working on making the BPF filters pin-able? So root
could enable the filter but then users have access to it. You have to
be root to be looking at other users in any case.
> Also, I admit that I don't know who actually uses this. But I can say
> sometimes people uses tools in a creative way. Anyway, I can imagine
> an use case that system is in a steady state and each user has dedicated
> jobs. Then scanning /proc would work well.
Another one for Google's tree then.
Thanks,
Ian
Powered by blists - more mailing lists