lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4AZiTNhI9qKGYh3@x1>
Date: Thu, 9 Jan 2025 15:46:33 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Ian Rogers <irogers@...gle.com>, Kan Liang <kan.liang@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	linux-perf-users@...r.kernel.org, bpf@...r.kernel.org,
	Howard Chu <howardchu95@...il.com>
Subject: Re: [PATCH] perf trace: Fix unaligned access for augmented args

On Thu, Jan 02, 2025 at 12:12:47PM -0800, Namhyung Kim wrote:
> Some version of compilers reported unaligned accesses in perf trace when
> undefined-behavior sanitizer is on.  I found that it uses raw data in the
> sample directly and assuming it's properly aligned.
> 
> Unlike other sample fields, the raw data is not 8-byte aligned because
> there's a size field (u32) before the actual data.  So I added a static
> buffer in syscall__augmented_args() and return it instead.  This is not
> ideal but should work well as perf trace is single-threaded.
> 
> A better approach would be aligning the raw data by adding a 4-byte data
> before the augmented args but I'm afraid it'd break the backward
> compatibility.
 
You mean for 'perf trace record' files?

Older tools will not be able to process new files, while old files will
be remain processable by new tools if we insert a u32 with zeroes before
the size field, that way if the first u32 is not zero, we do as you do
below and incur the cost of copying to that intermediary buffer,
otherwise we read the real size in the next u32 and don't incur the cost
of copying.

Your fix below works as it incurs the cost all the time, which is ok for
now, but as a follow up patch we can see if the approach I described
above works.

Applying.

- Arnaldo

> Closes: https://lore.kernel.org/r/Z2STgyD1p456Qqhg@google.com
> Cc: Howard Chu <howardchu95@...il.com>
> Signed-off-by: Namhyung Kim <namhyung@...nel.org>
> ---
>  tools/perf/builtin-trace.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index e70e634fbfaf33f5..3f06411514c5b58a 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2582,7 +2582,6 @@ static int trace__fprintf_sample(struct trace *trace, struct evsel *evsel,
>  
>  static void *syscall__augmented_args(struct syscall *sc, struct perf_sample *sample, int *augmented_args_size, int raw_augmented_args_size)
>  {
> -	void *augmented_args = NULL;
>  	/*
>  	 * For now with BPF raw_augmented we hook into raw_syscalls:sys_enter
>  	 * and there we get all 6 syscall args plus the tracepoint common fields
> @@ -2600,10 +2599,24 @@ static void *syscall__augmented_args(struct syscall *sc, struct perf_sample *sam
>  	int args_size = raw_augmented_args_size ?: sc->args_size;
>  
>  	*augmented_args_size = sample->raw_size - args_size;
> -	if (*augmented_args_size > 0)
> -		augmented_args = sample->raw_data + args_size;
> +	if (*augmented_args_size > 0) {
> +		static uintptr_t argbuf[1024]; /* assuming single-threaded */
> +
> +		if ((size_t)(*augmented_args_size) > sizeof(argbuf))
> +			return NULL;
> +
> +		/*
> +		 * The perf ring-buffer is 8-byte aligned but sample->raw_data
> +		 * is not because it's preceded by u32 size.  Later, beautifier
> +		 * will use the augmented args with stricter alignments like in
> +		 * some struct.  To make sure it's aligned, let's copy the args
> +		 * into a static buffer as it's single-threaded for now.
> +		 */
> +		memcpy(argbuf, sample->raw_data + args_size, *augmented_args_size);
>  
> -	return augmented_args;
> +		return argbuf;
> +	}
> +	return NULL;
>  }
>  
>  static void syscall__exit(struct syscall *sc)
> -- 
> 2.47.1.613.gc27f4b7a9f-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ