[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVqCUYrMDac520c1AvwGXmBRstGTZEeY8VeC=0hoCBrEg@mail.gmail.com>
Date: Thu, 13 Feb 2025 10:20:23 -0800
From: Ian Rogers <irogers@...gle.com>
To: Krzysztof Łopatowski <krzysztof.m.lopatowski@...il.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: perf: Question about machine__create_extra_kernel_maps and
trampoline symbols
On Thu, Feb 13, 2025 at 10:17 AM Krzysztof Łopatowski
<krzysztof.m.lopatowski@...il.com> wrote:
>
> Hi Ian,
>
> > We do have a kallsyms parsing benchmark:
>
> Yes, I've looked at `perf bench internals kallsyms-parse`. It returns for me
> Average kallsyms__parse took: 99,994 ms (+- 0,199 ms)
> However, this benchmark only measures the raw parsing speed of the kallsyms
> file, without any of the symbol processing that happens in real usage.
>
> > I was curious to know if the regression is also visible there?
>
> You can call it a regression if you mean from 2018 ;-)
> I gave measurements at the top to give a sense of scale and show it's not
> an already solved problem.
>
> The core issue is that we're calling 'kallsyms__parse' multiple times, when
> we could likely consolidate these calls since most of the overhead comes
> from reading and parsing, not from processing the symbols.
>
> Notably, the third call I mentioned (in machine__create_extra_kernel_maps)
> accounts for about half of the total kallsyms parsing time, yet appears to
> have no effect on my test system. This is why I'm questioning whether we
> need to keep this functionality.
>
> Ultimately, I believe we should explore ways to avoid reading /proc/kallsyms
> altogether, given how expensive this operation is.
Agreed. We had similar expensive operations in event parsing and that
has now largely been made lazy - so you can craft your command line to
not require all the costs. I can't answer your question but it seems
adding the symbol processing to the benchmark would have value.
Thanks,
Ian
Powered by blists - more mailing lists