lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fWnuQrrBoTn6Rrn6vM_xQ2fCoc9i-AitD7abTcNi-4o1Q@mail.gmail.com>
Date: Tue, 10 Sep 2024 10:41:31 -0700
From: Ian Rogers <irogers@...gle.com>
To: arnaldo.melo@...il.com
Cc: Alan Maguire <alan.maguire@...cle.com>, Adrian Hunter <adrian.hunter@...el.com>, 
	Howard Chu <howardchu95@...il.com>, Jiri Olsa <jolsa@...nel.org>, 
	Kan Liang <kan.liang@...ux.intel.com>, Namhyung Kim <namhyung@...nel.org>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 1/1] perf trace: If a syscall arg is marked as 'const',
 assume it is coming _from_ userspace

On Tue, Sep 10, 2024 at 10:05 AM <arnaldo.melo@...il.com> wrote:
>
> We need to decide where to copy syscall arg contents, if at the
> syscalls:sys_entry hook, meaning is something that is coming from
> user to kernel space, or if it is a response, i.e. if it is something
> the _kernel_ is filling in and thus going to userspace.
>
> Since we have 'const' used in those syscalls, and unsure about this
> being consistent, doing:
>
>   root@...ber:~# echo $(grep const /sys/kernel/tracing/events/syscalls/sys_enter_*/format  | grep struct | cut -c47- | cut -d'/' -f1)
>   clock_nanosleep clock_settime epoll_pwait2 futex io_pgetevents landlock_create_ruleset listmount mq_getsetattr mq_notify mq_timedreceive mq_timedsend preadv2 preadv prlimit64 process_madvise process_vm_readv process_vm_readv process_vm_writev process_vm_writev pwritev2 pwritev readv rt_sigaction rt_sigtimedwait semtimedop statmount timerfd_settime timer_settime vmsplice writev
>   root@...ber:~#
>
> Seems to indicate that we can use that for the ones that have the
> 'const' to mark it as coming from user space, do it.
>
> Most notable/frequent syscall that now gets BTF pretty printed in a
> system wide 'perf trace' session is:
>
>   root@...ber:~# perf trace
>      21.160 (         ): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e1dfe964, op: WAIT_BITSET|PRIVATE_FLAG, utime: (struct __kernel_timespec){.tv_sec = (__kernel_time64_t)50290,.tv_nsec = (long long int)810362837,}, val3: MATCH_ANY) ...
>       21.166 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
>       21.169 ( 0.001 ms): RemVidChild/6995 sendmsg(fd: 25<socket:[78915]>, msg: 0x7f49e9af9da0, flags: DONTWAIT) = 280
>       21.172 ( 0.289 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa58, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) = 0
>       21.463 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
>       21.467 ( 0.001 ms): RemVidChild/6995 futex(uaddr: 0x7f49e28bb964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
>       21.160 ( 0.314 ms): MediaSu~isor #/1028597  ... [continued]: futex())                                            = 0
>       21.469 (         ): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa5c, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) ...
>       21.475 ( 0.000 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49d0223040, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
>       21.478 ( 0.001 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e26ac964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
>   ^Croot@...ber:~#
>   root@...ber:~# cat /sys/kernel/tracing/events/syscalls/sys_enter_futex/format
>   name: sys_enter_futex
>   ID: 454
>   format:
>         field:unsigned short common_type;       offset:0;       size:2; signed:0;
>         field:unsigned char common_flags;       offset:2;       size:1; signed:0;
>         field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
>         field:int common_pid;   offset:4;       size:4; signed:1;
>
>         field:int __syscall_nr; offset:8;       size:4; signed:1;
>         field:u32 * uaddr;      offset:16;      size:8; signed:0;
>         field:int op;   offset:24;      size:8; signed:0;
>         field:u32 val;  offset:32;      size:8; signed:0;
>         field:const struct __kernel_timespec * utime;   offset:40;      size:8; signed:0;
>         field:u32 * uaddr2;     offset:48;      size:8; signed:0;
>         field:u32 val3; offset:56;      size:8; signed:0;
>
>   print fmt: "uaddr: 0x%08lx, op: 0x%08lx, val: 0x%08lx, utime: 0x%08lx, uaddr2: 0x%08lx, val3: 0x%08lx", ((unsigned long)(REC->uaddr)), ((unsigned long)(REC->op)), ((unsigned long)(REC->val)), ((unsigned long)(REC->utime)), ((unsigned long)(REC->uaddr2)), ((unsigned long)(REC->val3))
>   root@...ber:~#
>
> Suggested-by: Ian Rogers <irogers@...gle.com>
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Alan Maguire <alan.maguire@...cle.com>
> Cc: Howard Chu <howardchu95@...il.com>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Kan Liang <kan.liang@...ux.intel.com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>

Reviewed-by: Ian Rogers <irogers@...gle.com>

Thanks,
Ian

> ---
>  tools/perf/builtin-trace.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 83eb15b72333edd9..e26baf1256d5bb56 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2001,11 +2001,14 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
>
>                 len = strlen(field->name);
>
> +               // As far as heuristics (or intention) goes this seems to hold true, and makes sense!
> +               if ((field->flags & TEP_FIELD_IS_POINTER) && strstarts(field->type, "const "))
> +                       arg->from_user = true;
> +
>                 if (strcmp(field->type, "const char *") == 0 &&
>                     ((len >= 4 && strcmp(field->name + len - 4, "name") == 0) ||
>                      strstr(field->name, "path") != NULL)) {
>                         arg->scnprintf = SCA_FILENAME;
> -                       arg->from_user = true;
>                 } else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
>                         arg->scnprintf = SCA_PTR;
>                 else if (strcmp(field->type, "pid_t") == 0)
> --
> 2.46.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ