[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQCU67xtf4ndP2fo6fFxgsb7q_6uUooHQK4mb+Xi4fZR_ir0g@mail.gmail.com>
Date: Thu, 13 Feb 2025 19:17:00 +0100
From: Krzysztof Ćopatowski <krzysztof.m.lopatowski@...il.com>
To: Ian Rogers <irogers@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: perf: Question about machine__create_extra_kernel_maps and
trampoline symbols
Hi Ian,
> We do have a kallsyms parsing benchmark:
Yes, I've looked at `perf bench internals kallsyms-parse`. It returns for me
Average kallsyms__parse took: 99,994 ms (+- 0,199 ms)
However, this benchmark only measures the raw parsing speed of the kallsyms
file, without any of the symbol processing that happens in real usage.
> I was curious to know if the regression is also visible there?
You can call it a regression if you mean from 2018 ;-)
I gave measurements at the top to give a sense of scale and show it's not
an already solved problem.
The core issue is that we're calling 'kallsyms__parse' multiple times, when
we could likely consolidate these calls since most of the overhead comes
from reading and parsing, not from processing the symbols.
Notably, the third call I mentioned (in machine__create_extra_kernel_maps)
accounts for about half of the total kallsyms parsing time, yet appears to
have no effect on my test system. This is why I'm questioning whether we
need to keep this functionality.
Ultimately, I believe we should explore ways to avoid reading /proc/kallsyms
altogether, given how expensive this operation is.
Best regards,
Krzysztof
Powered by blists - more mailing lists