lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d47346fc-51b4-4af5-a014-0bd6f3b7bae0@intel.com>
Date: Mon, 6 May 2024 08:43:01 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Leo Yan <leo.yan@....com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
 Ian Rogers <irogers@...gle.com>, Namhyung Kim <namhyung@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
 James Clark <james.clark@....com>, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf maps: Process kcore maps in order

On 5/05/24 23:28, Leo Yan wrote:
> On Arm64, after enabling the 'DEBUG=1' build option, the tool exits
> abnormally:
> 
>   # ./perf report --itrace=Zi10ibT
>   # perf: util/maps.c:42: check_invariants: Assertion `map__end(prev) <= map__start(map) || map__start(prev) == map__start(map)' failed.
>     Aborted
> 
> The details for causing this error are described in below.
> 
> Firstly, machine__get_running_kernel_start() calculates the delta
> between the '_stext' symbol and the '_edata' symbol for the kernel map,
> alongside with eBPF maps:
> 
>   DSO              | Start address      | End address
>   -----------------+--------------------+--------------------
>   kernel.kallsyms    0xffff800080000000   0xffff800082229200
>   bpf_prog           0xffff800082545aac   0xffff800082545c94
>   ...
> 
> Then, the perf tool retrieves kcore maps:
> 
>   Kcore maps       | Start address      | End address
>   -----------------+--------------------+--------------------
>   kcore_text         0xffff800080000000   0xffff8000822f0000
>   vmalloc            0xffff800080000000   0xfffffdffbf800000
>   ...
> 
> Finally, the function dso__load_kcore() extends the kernel maps based on
> the retrieved kcore info. Since it uses the inverted order for
> processing the kcore maps, it extends maps for the vmalloc region prior
> to the 'kcore_text' map:
> 
>   DSO              | Start address      | End address
>   -----------------+--------------------+--------------------
>   kernel.kallsyms    0xffff800080000000   0xffff800082229200
>   kernel.kallsyms    0xffff800082229200   0xffff800082545aac -> Extended for vmalloc region
>   bpf_prog           0xffff800082545aac   0xffff800082545c94
>   ...
> 
>   DSO              | Start address      | End address
>   -----------------+--------------------+--------------------
>   kernel.kallsyms    0xffff800080000000   0xffff8000822f0000 -> Updated for kcore_text map
>   kernel.kallsyms    0xffff800082229200   0xffff800082545aac
>   bpf_prog           0xffff800082545aac   0xffff800082545c94
>   ...
> 
> As result, the two maps [0xffff800080000000..0xffff8000822f0000) and
> [0xffff800082229200..0xffff800082545aac) are overlapping and triggers
> failure.
> 
> The current code processes kcore maps in inverted order. To fix it, this
> patch adds kcore maps in the tail of list, afterwards these maps will be
> processed in the order. Therefore, the kernel text section can be
> processed prior to handling the vmalloc region, which avoids using the
> inaccurate kernel text size when extending maps with the vmalloc region.
> 
> Signed-off-by: Leo Yan <leo.yan@....com>
> ---
>  tools/perf/util/symbol.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 9ebdb8e13c0b..e15d70845488 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1266,7 +1266,24 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
>  	map__set_end(list_node->map, map__start(list_node->map) + len);
>  	map__set_pgoff(list_node->map, pgoff);
>  
> -	list_add(&list_node->node, &md->maps);
> +	/*
> +	 * Kcore maps are ordered with:
> +	 *   [_text.._end): Kernel text section
> +	 *   [VMALLOC_START..VMALLOC_END): vmalloc
> +	 *   ...
> +	 *
> +	 * On Arm64, the '_text' and 'VMALLOC_START' are the same values
> +	 * but VMALLOC_END (~124TiB) is much bigger then the text end
> +	 * address. So '_text' region is the subset of the vmalloc region.
> +	 *
> +	 * Afterwards, when dso__load_kcore() adjusts kernel maps, we must
> +	 * process the kernel text size prior to handling vmalloc region.
> +	 * This can avoid to using any inaccurate kernel text size when
> +	 * extending maps with vmalloc region. For this reason, here it
> +	 * always adds kcore maps to the tail of list to make sure the
> +	 * sequential handling is in order.
> +	 */
> +	list_add_tail(&list_node->node, &md->maps);

This seems reasonable, but I wonder if it might be robust
and future proof to also process the main map first
e.g. totally untested:

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 9ebdb8e13c0b..63bce45a5abb 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1365,16 +1365,15 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 	if (!replacement_map)
 		replacement_map = list_entry(md.maps.next, struct map_list_node, node)->map;
 
-	/* Add new maps */
+	/* Add replacement_map */
 	while (!list_empty(&md.maps)) {
 		struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
 		struct map *new_map = new_node->map;
 
-		list_del_init(&new_node->node);
-
 		if (RC_CHK_EQUAL(new_map, replacement_map)) {
 			struct map *map_ref;
 
+			list_del_init(&new_node->node);
 			map__set_start(map, map__start(new_map));
 			map__set_end(map, map__end(new_map));
 			map__set_pgoff(map, map__pgoff(new_map));
@@ -1385,20 +1384,29 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 			err = maps__insert(kmaps, map_ref);
 			map__put(map_ref);
 			map__put(new_map);
+			free(new_node);
 			if (err)
 				goto out_err;
-		} else {
-			/*
-			 * Merge kcore map into existing maps,
-			 * and ensure that current maps (eBPF)
-			 * stay intact.
-			 */
-			if (maps__merge_in(kmaps, new_map)) {
-				err = -EINVAL;
-				goto out_err;
-			}
 		}
+	}
+
+	/* Add new maps */
+	while (!list_empty(&md.maps)) {
+		struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
+		struct map *new_map = new_node->map;
+
+		list_del_init(&new_node->node);
+
+		/*
+		 * Merge kcore map into existing maps,
+		 * and ensure that current maps (eBPF)
+		 * stay intact.
+		 */
+		if (maps__merge_in(kmaps, new_map))
+			err = -EINVAL;
 		free(new_node);
+		if (err)
+			goto out_err;
 	}
 
 	if (machine__is(machine, "x86_64")) {



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ