lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM9d7ch98dRk85DYF-okD4_VsU_A+EkkVOkW1-1X7GMLao0mwg@mail.gmail.com>
Date:   Tue, 3 Jan 2023 08:45:34 -0800
From:   Namhyung Kim <namhyung@...nel.org>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Thomas Richter <tmricht@...ux.ibm.com>,
        linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        svens@...ux.ibm.com, gor@...ux.ibm.com, sumanthk@...ux.ibm.com,
        hca@...ux.ibm.com
Subject: Re: [PATCH] perf lock: Fix core dump in command perf lock contention

Hi Arnaldo,

On Mon, Jan 2, 2023 at 7:33 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> Em Fri, Dec 30, 2022 at 11:26:27AM +0100, Thomas Richter escreveu:
> > The test case perf lock contention dumps core on s390. Run the following
> > commands:
> >  # ./perf lock record -- ./perf bench sched messaging
> >  # Running 'sched/messaging' benchmark:
> >  # 20 sender and receiver processes per group
> >  # 10 groups == 400 processes run
> >
> >      Total time: 2.799 [sec]
> >  [ perf record: Woken up 1 times to write data ]
> >  [ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ]
> >  #
> >  # ./perf lock contention
> >  Segmentation fault (core dumped)
> >  #
> >
> > The function call stack is lengthy, here are the top 5 functions:
> >  # gdb ./perf core.24048
> >  GNU gdb (GDB) Fedora Linux 12.1-6.fc37
> >  Copyright (C) 2022 Free Software Foundation, Inc.
> >  Core was generated by `./perf lock contention'.
> >  Program terminated with signal SIGSEGV, Segmentation fault.
> >  #0  0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28,
> >          addr=1789230) at util/machine.c:3356
> >          3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start);
> >
> >  (gdb) where
> >   #0  0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28,\
> >          addr=1789230) at util/machine.c:3356
> >   #1  0x000000000109f244 in callchain_id (evsel=0x30313e0,\
> >          sample=0x3ffea4f77d0) at builtin-lock.c:957
> >   #2  0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290,\
> >          addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) \
> >          at builtin-lock.c:586
> >   #3  0x000000000109f4d0 in report_lock_contention_begin_event \
> >          (evsel=0x30313e0, sample=0x3ffea4f77d0)
> >          at builtin-lock.c:1004
> >   #4  0x00000000010a00ae in evsel__process_contention_begin \
> >          (evsel=0x30313e0, sample=0x3ffea4f77d0)
> >          at builtin-lock.c:1254
> >   #5  0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, \
> >          event=0x3ff85601ef8, sample=0x3ffea4f77d0,
> >          evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464
> >           sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) \
> >        at util/session.c:1523
> >   .....
> >
> > The issue is in function machine__is_lock_function() in file
> > ./util/machine.c lines 3355:
> >    /* should not fail from here */
> >    sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end",
> >                                            &kmap);
> >    machine->sched.text_end = kmap->unmap_ip(kmap, sym->start)
> >
> > On s390 the symbol __sched_text_end is *NOT* in the symbol list and the
> > resulting pointer sym is set to NULL. The sym->start is then a NULL pointer
> > access and generates the core dump.
> >
> > The reason why __sched_text_end is not in the symbol list on s390 is
> > simple:
> > When the symbol list is created at perf start up with function calls
> >   dso__load
> >   +--> dso__load_vmlinux_path
> >        +--> dso__load_vmlinux
> >             +--> dso__load_sym
> >                +--> dso__load_sym_internal (reads kernel symbols)
> >                +--> symbols__fixup_end
> >                +--> symbols__fixup_duplicate
> >
> > The issue is in function symbols__fixup_duplicate(). It deletes all
> > symbols with have the same address. On s390
> >  # nm -g  ~/linux/vmlinux| fgrep c68390
> >  0000000000c68390 T __cpuidle_text_start
> >  0000000000c68390 T __sched_text_end
> >  #
> > two symbols have identical addresses and __sched_text_end is considered
> > duplicate (in ascending sort order) and removed from the symbol list.
> > Therefore it is missing and an invalid pointer reference occurs.
> > The code checks for symbol __sched_text_start and when it exists assumes
> > symbol __sched_text_end is also in the symbol table. However this is
> > not the case on s390.
> >
> > Same situation exists for symbol __lock_text_start:
> > 0000000000c68770 T __cpuidle_text_end
> > 0000000000c68770 T __lock_text_start
> > This symbol is also removed from the symbol table but used in function
> > machine__is_lock_function().
> >
> > To fix this and keep duplicate symbols in the symbol table, set
> > symbol_conf.allow_aliases to true. This prevents the removal of duplicate
> > symbols in function symbols__fixup_duplicate().
> >
> > Output After:
> >  # ./perf lock contention
> >  contended total wait  max wait  avg wait    type   caller
> >
> >         48   124.39 ms 123.99 ms   2.59 ms rwsem:W unlink_anon_vmas+0x24a
> >         47    83.68 ms  83.26 ms   1.78 ms rwsem:W free_pgtables+0x132
> >          5    41.22 us  10.55 us   8.24 us rwsem:W free_pgtables+0x140
> >          4    40.12 us  20.55 us  10.03 us rwsem:W copy_process+0x1ac8
> >  #
> >
> > Fixes: cc2367eebb0c ("machine: Adopt is_lock_function() from builtin-lock.c")
>
> Humm, is that really the cset that introduces the problem? It just moves
> things around, the cset that introduced the is_lock_function() function,
> that assumed that __sched_text_end was always available was:
>
> commit 0d2997f750d1de394231bc22768dab94a5b5db2f
> Author: Namhyung Kim <namhyung@...nel.org>
> Date:   Wed Jun 15 09:32:22 2022 -0700
>
>     perf lock: Look up callchain for the contended locks
>
> ---
>
> Right? Namhyung? Can you spot any problem in enabling duplicates as a
> fix?

Yep, I think that's the cset introduced the problem.
I'm fine with the fix.

Acked-by: Namhyung Kim <namhyung@...nel.org>

Thanks,
Namhyung

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ