lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Dec 2022 11:26:27 +0100
From:   Thomas Richter <tmricht@...ux.ibm.com>
To:     linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        acme@...nel.org
Cc:     svens@...ux.ibm.com, gor@...ux.ibm.com, sumanthk@...ux.ibm.com,
        hca@...ux.ibm.com, Thomas Richter <tmricht@...ux.ibm.com>
Subject: [PATCH] perf lock: Fix core dump in command perf lock contention

The test case perf lock contention dumps core on s390. Run the following
commands:
 # ./perf lock record -- ./perf bench sched messaging
 # Running 'sched/messaging' benchmark:
 # 20 sender and receiver processes per group
 # 10 groups == 400 processes run

     Total time: 2.799 [sec]
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ]
 #
 # ./perf lock contention
 Segmentation fault (core dumped)
 #

The function call stack is lengthy, here are the top 5 functions:
 # gdb ./perf core.24048
 GNU gdb (GDB) Fedora Linux 12.1-6.fc37
 Copyright (C) 2022 Free Software Foundation, Inc.
 Core was generated by `./perf lock contention'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28,
         addr=1789230) at util/machine.c:3356
         3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start);

 (gdb) where
  #0  0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28,\
         addr=1789230) at util/machine.c:3356
  #1  0x000000000109f244 in callchain_id (evsel=0x30313e0,\
         sample=0x3ffea4f77d0) at builtin-lock.c:957
  #2  0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290,\
         addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) \
         at builtin-lock.c:586
  #3  0x000000000109f4d0 in report_lock_contention_begin_event \
         (evsel=0x30313e0, sample=0x3ffea4f77d0)
         at builtin-lock.c:1004
  #4  0x00000000010a00ae in evsel__process_contention_begin \
         (evsel=0x30313e0, sample=0x3ffea4f77d0)
         at builtin-lock.c:1254
  #5  0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, \
         event=0x3ff85601ef8, sample=0x3ffea4f77d0,
         evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464
          sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) \
	 at util/session.c:1523
  .....

The issue is in function machine__is_lock_function() in file
./util/machine.c lines 3355:
   /* should not fail from here */
   sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end",
		                             &kmap);
   machine->sched.text_end = kmap->unmap_ip(kmap, sym->start)

On s390 the symbol __sched_text_end is *NOT* in the symbol list and the
resulting pointer sym is set to NULL. The sym->start is then a NULL pointer
access and generates the core dump.

The reason why __sched_text_end is not in the symbol list on s390 is
simple:
When the symbol list is created at perf start up with function calls
  dso__load
  +--> dso__load_vmlinux_path
       +--> dso__load_vmlinux
            +--> dso__load_sym
	         +--> dso__load_sym_internal (reads kernel symbols)
		 +--> symbols__fixup_end
		 +--> symbols__fixup_duplicate

The issue is in function symbols__fixup_duplicate(). It deletes all
symbols with have the same address. On s390
 # nm -g  ~/linux/vmlinux| fgrep c68390
 0000000000c68390 T __cpuidle_text_start
 0000000000c68390 T __sched_text_end
 #
two symbols have identical addresses and __sched_text_end is considered
duplicate (in ascending sort order) and removed from the symbol list.
Therefore it is missing and an invalid pointer reference occurs.
The code checks for symbol __sched_text_start and when it exists assumes
symbol __sched_text_end is also in the symbol table. However this is
not the case on s390.

Same situation exists for symbol __lock_text_start:
0000000000c68770 T __cpuidle_text_end
0000000000c68770 T __lock_text_start
This symbol is also removed from the symbol table but used in function
machine__is_lock_function().

To fix this and keep duplicate symbols in the symbol table, set
symbol_conf.allow_aliases to true. This prevents the removal of duplicate
symbols in function symbols__fixup_duplicate().

Output After:
 # ./perf lock contention
 contended total wait  max wait  avg wait    type   caller

        48   124.39 ms 123.99 ms   2.59 ms rwsem:W unlink_anon_vmas+0x24a
        47    83.68 ms  83.26 ms   1.78 ms rwsem:W free_pgtables+0x132
         5    41.22 us  10.55 us   8.24 us rwsem:W free_pgtables+0x140
         4    40.12 us  20.55 us  10.03 us rwsem:W copy_process+0x1ac8
 #

Fixes: cc2367eebb0c ("machine: Adopt is_lock_function() from builtin-lock.c")
Cc: Arnaldo Carvalho de Melo <acme@...hat.com>
Signed-off-by: Thomas Richter <tmricht@...ux.ibm.com>
---
 tools/perf/builtin-lock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 718b82bfcdff..506c2fe42d52 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -1670,6 +1670,7 @@ static int __cmd_report(bool display_info)
 
 	/* for lock function check */
 	symbol_conf.sort_by_name = true;
+	symbol_conf.allow_aliases = true;
 	symbol__init(&session->header.env);
 
 	if (!data.is_pipe) {
@@ -1757,6 +1758,7 @@ static int __cmd_contention(int argc, const char **argv)
 
 	/* for lock function check */
 	symbol_conf.sort_by_name = true;
+	symbol_conf.allow_aliases = true;
 	symbol__init(&session->header.env);
 
 	if (use_bpf) {
-- 
2.38.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ