lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 4 Jun 2024 11:48:09 -0700
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Milian Wolff <milian.wolff@...b.com>, linux-perf-users@...r.kernel.org, 
	Namhyung Kim <namhyung@...nel.org>, Arnaldo Carvalho de Melo <acme@...el.org>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: perf 6.9-1 (archlinux) crashes during recording of cycles + raw_syscalls

On Tue, Jun 4, 2024 at 7:12 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> On Tue, Jun 04, 2024 at 10:50:00AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Jun 04, 2024 at 01:44:18PM +0200, Milian Wolff wrote:
> > > the following command crashes perf for me, is this still an issue upstream and
> > > should I report it to arch for backporting? I cannot build the official perf/
> > > core branch, so I am afraid I cannot test it myself there:
>
> > > ```
> > > sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e
> > > raw_syscalls:sys_enter ls
> > > ...
> > > [ perf record: Woken up 3 times to write data ]
> > > malloc(): invalid next size (unsorted)
> > > Aborted
> > > ```
> > > Backtrace with GDB + debuginfod:
> > > ```
> > > malloc(): invalid next size (unsorted)
>
> > I reproduced this all the way back to 6.8, trying to bisect now, thanks
> > for the report,
>
> Can you please try with the attached and perhaps provide your Tested-by?
>
> Thanks,
>
> - Arnaldo
>
> From ab355e2c6b4cf641a9fff7af38059cf69ac712d5 Mon Sep 17 00:00:00 2001
> From: Arnaldo Carvalho de Melo <acme@...hat.com>
> Date: Tue, 4 Jun 2024 11:00:22 -0300
> Subject: [PATCH 1/1] Revert "perf record: Reduce memory for recording
>  PERF_RECORD_LOST_SAMPLES event"
>
> This reverts commit 7d1405c71df21f6c394b8a885aa8a133f749fa22.

I think we should try to fight back reverts when possible. Reverts are
removing something somebody poured time and attention into. When a
regression has occurred then I think we should add the regression case
as a test.

> This causes segfaults in some cases, as reported by Milian:
>
>   ```
>   sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e
>   raw_syscalls:sys_enter ls
>   ...
>   [ perf record: Woken up 3 times to write data ]
>   malloc(): invalid next size (unsorted)
>   Aborted
>   ```
>
>   Backtrace with GDB + debuginfod:
>
>   ```
>   malloc(): invalid next size (unsorted)
>
>   Thread 1 "perf" received signal SIGABRT, Aborted.
>   __pthread_kill_implementation (threadid=<optimized out>, signo=signo@...ry=6,
>   no_tid=no_tid@...ry=0) at pthread_kill.c:44
>   Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
>   44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO
>   (ret) : 0;
>   (gdb) bt
>   #0  __pthread_kill_implementation (threadid=<optimized out>,
>   signo=signo@...ry=6, no_tid=no_tid@...ry=0) at pthread_kill.c:44
>   #1  0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>,
>   signo=6) at pthread_kill.c:78
>   #2  0x00007ffff6e50a30 in __GI_raise (sig=sig@...ry=6) at ../sysdeps/posix/
>   raise.c:26
>   #3  0x00007ffff6e384c3 in __GI_abort () at abort.c:79
>   #4  0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@...ry=0x7ffff6fc22ea
>   "%s\n") at ../sysdeps/posix/libc_fatal.c:132
>   #5  0x00007ffff6eb3085 in malloc_printerr (str=str@...ry=0x7ffff6fc5850
>   "malloc(): invalid next size (unsorted)") at malloc.c:5772
>   #6  0x00007ffff6eb657c in _int_malloc (av=av@...ry=0x7ffff6ff6ac0
>   <main_arena>, bytes=bytes@...ry=368) at malloc.c:4081
>   #7  0x00007ffff6eb877e in __libc_calloc (n=<optimized out>,
>   elem_size=<optimized out>) at malloc.c:3754
>   #8  0x000055555569bdb6 in perf_session.do_write_header ()
>   #9  0x00005555555a373a in __cmd_record.constprop.0 ()
>   #10 0x00005555555a6846 in cmd_record ()
>   #11 0x000055555564db7f in run_builtin ()
>   #12 0x000055555558ed77 in main ()
>   ```
>
>   Valgrind memcheck:
>   ```
>   ==45136== Invalid write of size 8
>   ==45136==    at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf)

The write that is happening here is because an empty sample except for
the id value is written out:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/synthetic-events.c?h=perf-tools-next#n1805
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-record.c?h=perf-tools-next#n1910
It'd been missed that the sample wasn't quite empty and needed space.

Rather than making the space 64kb again, can we not use
perf_event__sample_event_size or some max constant?

Thanks,
Ian

>   ==45136==    by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf)
>   ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
>   ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
>   ==45136==    by 0x142D76: main (in /usr/bin/perf)
>   ==45136==  Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
>   ==45136==    at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
>   ==45136==    by 0x3574AB: zalloc (in /usr/bin/perf)
>   ==45136==    by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
>   ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
>   ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
>   ==45136==    by 0x142D76: main (in /usr/bin/perf)
>   ==45136==
>   ==45136== Syscall param write(buf) points to unaddressable byte(s)
>   ==45136==    at 0x575953D: __libc_write (write.c:26)
>   ==45136==    by 0x575953D: write (write.c:24)
>   ==45136==    by 0x35761F: ion (in /usr/bin/perf)
>   ==45136==    by 0x357778: writen (in /usr/bin/perf)
>   ==45136==    by 0x1548F7: record__write (in /usr/bin/perf)
>   ==45136==    by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf)
>   ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
>   ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
>   ==45136==    by 0x142D76: main (in /usr/bin/perf)
>   ==45136==  Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
>   ==45136==    at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
>   ==45136==    by 0x3574AB: zalloc (in /usr/bin/perf)
>   ==45136==    by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
>   ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
>   ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
>   ==45136==    by 0x142D76: main (in /usr/bin/perf)
>   ==45136==
>  -----
>
> Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/
> Reported-by: Milian Wolff <milian.wolff@...b.com>
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Ian Rogers <irogers@...gle.com>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Kan Liang <kan.liang@...ux.intel.com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: stable@...nel.org # 6.8+
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> ---
>  tools/perf/builtin-record.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 66a3de8ac66186b0..0a8ba1323d64be6b 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1956,8 +1956,7 @@ static void record__read_lost_samples(struct record *rec)
>
>                                 if (count.lost) {
>                                         if (!lost) {
> -                                               lost = zalloc(sizeof(*lost) +
> -                                                             session->machines.host.id_hdr_size);
> +                                               lost = zalloc(PERF_SAMPLE_MAX_SIZE);
>                                                 if (!lost) {
>                                                         pr_debug("Memory allocation failed\n");
>                                                         return;
> @@ -1973,8 +1972,7 @@ static void record__read_lost_samples(struct record *rec)
>                 lost_count = perf_bpf_filter__lost_count(evsel);
>                 if (lost_count) {
>                         if (!lost) {
> -                               lost = zalloc(sizeof(*lost) +
> -                                             session->machines.host.id_hdr_size);
> +                               lost = zalloc(PERF_SAMPLE_MAX_SIZE);
>                                 if (!lost) {
>                                         pr_debug("Memory allocation failed\n");
>                                         return;
> --
> 2.45.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ