[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7409ef9-0a80-9874-ef60-0fab0abb9711@intel.com>
Date: Tue, 18 Apr 2023 10:03:44 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Ian Rogers <irogers@...gle.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 0/5] perf: Add ioctl to emit sideband events
On 17/04/23 19:37, Ian Rogers wrote:
> On Mon, Apr 17, 2023 at 4:02 AM Peter Zijlstra <peterz@...radead.org> wrote:
>>
>> On Fri, Apr 14, 2023 at 11:22:55AM +0300, Adrian Hunter wrote:
>>> Hi
>>>
>>> Here is a stab at adding an ioctl for sideband events.
>>>
>>> This is to overcome races when reading the same information
>>> from /proc.
>>
>> What races? Are you talking about reading old state in /proc the kernel
>> delivering a sideband event for the new state, and then you writing the
>> old state out?
>>
>> Surely that's something perf tool can fix without kernel changes?
>
> So my reading is that during event synthesis there are races between
> reading the different /proc files. There is still, I believe, a race
> in with perf record/top with uid filtering which reminds me of this.
> The uid filtering race is that we scan /proc to find processes (pids)
> for a uid, we then synthesize the maps for each of these pids but if a
> pid starts or exits we either error out or don't sample that pid. I
> believe the error out behavior is easy to hit 100% of the time making
> uid mode of limited use.
>
> This may be for something other than synthesis, but for synthesis a
> few points are:
> - as servers get bigger and consequently more jobs get consolidated
> on them, synthesis is slow (hence --num-thread-synthesize) and also
> the events dominate the perf.data file - perhaps >90% of the file
> size, and a lot of that will be for processes with no samples in them.
Note also, for hardware tracing, it isn't generally possible to know
that during tracing, and figuring it out afterwards and working
backwards may not be feasible.
> Another issue here is that all those file descriptors don't come for
> free in the kernel.
> - BPF has buildid+offset stack traces that remove the need for
> synthesis by having more expensive stack generation. I believe this is
> unpopular as adding this as a variant for every kind of event would be
> hard, but perhaps we can do some low-hanging fruit like instructions
> and cycles.
> - I believe Jiri looked at doing synthesis with BPF. Perhaps we could
> do something similar to the off-cpu and tail-synthesize, where more
> things happen at the tail end of perf. Off-cpu records data in maps
> that it then synthesizes into samples.
>
> There is also a long standing issue around not sampling munmap (or
> mremap) that causes plenty of issues. Perhaps if we had less mmap in
> the perf.data file we could add these.
>
> Thanks,
> Ian
Powered by blists - more mailing lists