[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef5862c3-4a42-4b2a-a2ec-625e0c9806e9@intel.com>
Date: Mon, 20 Oct 2025 09:19:22 +0800
From: "Li, Tianyou" <tianyou.li@...el.com>
To: Namhyung Kim <namhyung@...nel.org>
CC: James Clark <james.clark@...aro.org>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, "Arnaldo Carvalho de
Melo" <acme@...nel.org>, Mark Rutland <mark.rutland@....com>, "Alexander
Shishkin" <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
"Kan Liang" <kan.liang@...ux.intel.com>, Ravi Bangoria
<ravi.bangoria@....com>, <wangyang.guo@...el.com>, <pan.deng@...el.com>,
<zhiguo.zhou@...el.com>, <jiebin.sun@...el.com>, <thomas.falcon@...el.com>,
<dapeng1.mi@...el.com>, <linux-perf-users@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] perf tools annotate: fix a crash when annotate the
same symbol with 's' and 'T'
Hi Namhyung,
On 10/19/2025 11:31 AM, Namhyung Kim wrote:
> Hello,
>
> On Fri, Oct 17, 2025 at 12:04:42AM +0800, Li, Tianyou wrote:
>> On 10/16/2025 11:18 PM, James Clark wrote:
>>>
>>> On 16/10/2025 4:04 pm, Li, Tianyou wrote:
>>>> On 10/16/2025 9:06 PM, James Clark wrote:
>>>>>
>>>>> On 16/10/2025 4:36 am, Li, Tianyou wrote:
>>>>>> Hi James,
>>>>>>
>>>>>> Thanks for your time to review. Please see my comments inlined.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tianyou
>>>>>>
>>>>>> On 10/16/2025 1:30 AM, James Clark wrote:
>>>>>>>
>>>>>>> On 15/10/2025 6:20 pm, Tianyou Li wrote:
>>>>>>>> When perf report with annotation for a symbol, press
>>>>>>>> 's' and 'T', then exit
>>>>>>>> the annotate browser. Once annotate the same symbol,
>>>>>>>> the annotate browser
>>>>>>>> will crash.
>>>>>>>>
>>>>>>>> The browser.arch was required to be correctly updated when data type
>>>>>>>> feature was enabled by 'T'. Usually it was
>>>>>>>> initialized by symbol__annotate2
>>>>>>>> function. If a symbol has already been correctly
>>>>>>>> annotated at the first
>>>>>>>> time, it should not call the symbol__annotate2
>>>>>>>> function again, thus the
>>>>>>>> browser.arch will not get initialized. Then at the
>>>>>>>> second time to show the
>>>>>>>> annotate browser, the data type needs to be
>>>>>>>> displayed but the browser.arch
>>>>>>>> is empty.
>>>>>>>>
>>>>>>>> Stack trace as below:
>>>>>>>>
>>>>>>>> Perf: Segmentation fault
>>>>>>>> -------- backtrace --------
>>>>>>>> #0 0x55d365 in ui__signal_backtrace setup.c:0
>>>>>>>> #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
>>>>>>>> #2 0x570f08 in arch__is perf[570f08]
>>>>>>>> #3 0x562186 in annotate_get_insn_location perf[562186]
>>>>>>>> #4 0x562626 in __hist_entry__get_data_type annotate.c:0
>>>>>>>> #5 0x56476d in annotation_line__write perf[56476d]
>>>>>>>> #6 0x54e2db in annotate_browser__write annotate.c:0
>>>>>>>> #7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
>>>>>>>> #8 0x54dc9e in annotate_browser__refresh annotate.c:0
>>>>>>>> #9 0x54c03d in __ui_browser__refresh browser.c:0
>>>>>>>> #10 0x54ccf8 in ui_browser__run perf[54ccf8]
>>>>>>>> #11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
>>>>>>>> #12 0x552293 in do_annotate hists.c:0
>>>>>>>> #13 0x55941c in evsel__hists_browse hists.c:0
>>>>>>>> #14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
>>>>>>>> #15 0x42ff02 in cmd_report perf[42ff02]
>>>>>>>> #16 0x494008 in run_builtin perf.c:0
>>>>>>>> #17 0x494305 in handle_internal_command perf.c:0
>>>>>>>> #18 0x410547 in main perf[410547]
>>>>>>>> #19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
>>>>>>>> #20 0x7f5ff1a29680 in
>>>>>>>> __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
>>>>>>>> #21 0x410b75 in _start perf[410b75]
>>>>>>>>
>>>>>>>> Fixes: 1d4374afd000 ("perf annotate: Add 'T' hot key
>>>>>>>> to toggle data type display")
>>>>>>>> Reviewed-by: James Clark <james.clark@...aro.org>
>>>>>>>> Signed-off-by: Tianyou Li <tianyou.li@...el.com>
>>>>>>>> ---
>>>>>>>> tools/perf/ui/browsers/annotate.c | 3 +++
>>>>>>>> tools/perf/util/annotate.c | 2 +-
>>>>>>>> tools/perf/util/annotate.h | 2 ++
>>>>>>>> 3 files changed, 6 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/tools/perf/ui/browsers/annotate.c
>>>>>>>> b/tools/perf/ui/ browsers/annotate.c
>>>>>>>> index 8fe699f98542..3b27ef1e8490 100644
>>>>>>>> --- a/tools/perf/ui/browsers/annotate.c
>>>>>>>> +++ b/tools/perf/ui/browsers/annotate.c
>>>>>>>> @@ -1161,6 +1161,9 @@ int
>>>>>>>> __hist_entry__tui_annotate(struct hist_entry *he,
>>>>>>>> struct map_symbol *ms,
>>>>>>>> if (!annotation__has_source(notes))
>>>>>>>> ui__warning("Annotation has no source code.");
>>>>>>>> }
>>>>>>>> + } else if (evsel__get_arch(evsel, &browser.arch)) {
>>>>>>>> + ui__error("Couldn't get architecture for
>>>>>>>> event '%s'", evsel- >name);
>>>>>>>> + return -1;
>>>>>>>> }
>>>>>>> symbol_annotate() only fails for negative return values
>>>>>>> of evsel__get_arch(), but evsel__get_arch() has at least
>>>>>>> two positive error return values.
>>>>>>>
>>>>>>> If symbol_annotate() is wrong and it should be != 0 like
>>>>>>> you have, then maybe symbol_annotate() should be fixed
>>>>>>> in another commit in the same patchset as this one.
>>>>>>> Otherwise you have two calls to the same thing right
>>>>>>> next to each other that handle errors differently.
>>>>>>
>>>>>> Thanks James. I will give a try on handling the error
>>>>>> message with symbol__strerror_disassemble. I am conservative
>>>>>> to change the code in symbol_annotate, agreed it should be
>>>>>> considered in another patch. Would like to focus this
>>>>>> particular issue and get it fixed properly. Thanks.
>>>>>>
>>>>>>
>>>>> Looks like there was a misunderstanding. I'm not saying that the
>>>>> error is _reported_ differently, it's that the condition that
>>>>> triggers the error is different.
>>>>>
>>>>> symbol__annotate():
>>>>>
>>>>> err = evsel__get_arch(evsel, &arch);
>>>>> if (err < 0)
>>>>> return err;
>>>>>
>>>>> You added:
>>>>>
>>>>> if (evsel__get_arch(evsel, &browser.arch))
>>>>> ...
>>>>>
>>>>> evsel__get_arch() returns positive error values (and maybe also
>>>>> negative?), so "< 0" behaves differently to "!= 0".
>>>>>
>>>>> You either have to assume that "< 0" is correct and not change
>>>>> it, but then you have to also check the return value in the same
>>>>> way. Or if by doing "!= 0" you're implying that
>>>>> symbol__annotate() is wrong to do "< 0", then you should fix it
>>>>> now to not leave __hist_entry__tui_annotate() doing the same
>>>>> thing two different ways at different times.
>>>>>
>>>> Thanks James. I looked at the code of symbol__annotate, and noticed
>>>> the if (err<0) statement. I did not mean to change the code in
>>>> symbol__annotate because I did not understand why it handled the
>>>> error code that way. The positive return value of evsel__get_arch
>>>> indicates some error happens, eg in arm__annotate_init, so I use the
>>>> symbol__strerror_disassemble function to handle both positive and
>>>> negative error code.
>>>>
>>>> I do agree we should check the error code of evsel__get_arch, but I
>>>> am hesitate to touch the code which I am not sure the consequences.
>>>> I agree it may deserve another patch but not in this patchset if we
>>>> have clear answers on why "<0" is not correct, or we have a case to
>>>> break the current code as a evidence. Thanks.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Tianyou
>>>>
>>> It may take a little bit of effort to follow the code and look at the
>>> git blame to see what happened, but it's really not going to be that
>>> hard.
>> Truly appreciated for your instant response, and the suggestions about
>> 'Fixes' tag, return value handling etc. I do check the git history about the
>> code "<0", I still did not quite understand the reason of handling it in
>> that way.
> Looks like I just overlooked the error handling when I factored out the
> function. Please feel free to update symbol__annotate() to check != 0.
Thanks for the confirmation. Will send the patch v3 soon.
>>
>>> You're basically suggesting to add code that (when expanded) does this:
>>>
>>> if (first_run) {
>>> if (do_important_thing() < 0)
>>> return err;
>>> } else { // second run
>>> if (do_important_thing() != 0)
>>> return err;
>>> }
>>>
>>> It's not going to help anyone who looks at it in the future. It's going
>>> to make future refactors of evsel__get_arch() more difficult, and
>>> without knowing why it's like that, it's possibly introducing another
>>> bug.
>>>
>> I am suggesting to focus on the 'else' part. If that part of code is
>> correct, then we might need to consider another patch for the "<0" code. I
>> am eager for the answer as well.
>>
>>
>>> It surely has to be consistent otherwise it doesn't make sense. And if
>>> you sent a patch that did "< 0" I would still say "but it can return
>>> positive errors, so the new code isn't right".
>>> I did suggest in the beginning to not check the error at all and add a
>>> comment saying it must succeed at that point because it's already done
>>> once before, but that's not very defensive and it doesn't fix the other
>>> possible bug.
>>>
>> Yes. I am not so sure 'must succeed' could be a right assumption, or for
>> safety it's better to check the error code.
> Agreed. Anyway I can confirm that this patch fixed the crash.
>
> Tested-by: Namhyung Kim <namhyung@...nel.org>
Thanks. Will include and send the patch v3 soon.
> Thanks,
> Namhyung
>
Powered by blists - more mailing lists