lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63c6162b-f336-46bf-9764-a5ae0327e1c8@intel.com>
Date: Mon, 6 May 2024 08:46:18 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Leo Yan <leo.yan@....com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
 Ian Rogers <irogers@...gle.com>, Namhyung Kim <namhyung@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
 James Clark <james.clark@....com>, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf maps: Process kcore maps in order

On 6/05/24 08:43, Adrian Hunter wrote:
> On 5/05/24 23:28, Leo Yan wrote:
>> On Arm64, after enabling the 'DEBUG=1' build option, the tool exits
>> abnormally:
>>
>>   # ./perf report --itrace=Zi10ibT
>>   # perf: util/maps.c:42: check_invariants: Assertion `map__end(prev) <= map__start(map) || map__start(prev) == map__start(map)' failed.
>>     Aborted
>>
>> The details for causing this error are described in below.
>>
>> Firstly, machine__get_running_kernel_start() calculates the delta
>> between the '_stext' symbol and the '_edata' symbol for the kernel map,
>> alongside with eBPF maps:
>>
>>   DSO              | Start address      | End address
>>   -----------------+--------------------+--------------------
>>   kernel.kallsyms    0xffff800080000000   0xffff800082229200
>>   bpf_prog           0xffff800082545aac   0xffff800082545c94
>>   ...
>>
>> Then, the perf tool retrieves kcore maps:
>>
>>   Kcore maps       | Start address      | End address
>>   -----------------+--------------------+--------------------
>>   kcore_text         0xffff800080000000   0xffff8000822f0000
>>   vmalloc            0xffff800080000000   0xfffffdffbf800000
>>   ...
>>
>> Finally, the function dso__load_kcore() extends the kernel maps based on
>> the retrieved kcore info. Since it uses the inverted order for
>> processing the kcore maps, it extends maps for the vmalloc region prior
>> to the 'kcore_text' map:
>>
>>   DSO              | Start address      | End address
>>   -----------------+--------------------+--------------------
>>   kernel.kallsyms    0xffff800080000000   0xffff800082229200
>>   kernel.kallsyms    0xffff800082229200   0xffff800082545aac -> Extended for vmalloc region
>>   bpf_prog           0xffff800082545aac   0xffff800082545c94
>>   ...
>>
>>   DSO              | Start address      | End address
>>   -----------------+--------------------+--------------------
>>   kernel.kallsyms    0xffff800080000000   0xffff8000822f0000 -> Updated for kcore_text map
>>   kernel.kallsyms    0xffff800082229200   0xffff800082545aac
>>   bpf_prog           0xffff800082545aac   0xffff800082545c94
>>   ...
>>
>> As result, the two maps [0xffff800080000000..0xffff8000822f0000) and
>> [0xffff800082229200..0xffff800082545aac) are overlapping and triggers
>> failure.
>>
>> The current code processes kcore maps in inverted order. To fix it, this
>> patch adds kcore maps in the tail of list, afterwards these maps will be
>> processed in the order. Therefore, the kernel text section can be
>> processed prior to handling the vmalloc region, which avoids using the
>> inaccurate kernel text size when extending maps with the vmalloc region.
>>
>> Signed-off-by: Leo Yan <leo.yan@....com>
>> ---
>>  tools/perf/util/symbol.c | 19 ++++++++++++++++++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
>> index 9ebdb8e13c0b..e15d70845488 100644
>> --- a/tools/perf/util/symbol.c
>> +++ b/tools/perf/util/symbol.c
>> @@ -1266,7 +1266,24 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
>>  	map__set_end(list_node->map, map__start(list_node->map) + len);
>>  	map__set_pgoff(list_node->map, pgoff);
>>  
>> -	list_add(&list_node->node, &md->maps);
>> +	/*
>> +	 * Kcore maps are ordered with:
>> +	 *   [_text.._end): Kernel text section
>> +	 *   [VMALLOC_START..VMALLOC_END): vmalloc
>> +	 *   ...
>> +	 *
>> +	 * On Arm64, the '_text' and 'VMALLOC_START' are the same values
>> +	 * but VMALLOC_END (~124TiB) is much bigger then the text end
>> +	 * address. So '_text' region is the subset of the vmalloc region.
>> +	 *
>> +	 * Afterwards, when dso__load_kcore() adjusts kernel maps, we must
>> +	 * process the kernel text size prior to handling vmalloc region.
>> +	 * This can avoid to using any inaccurate kernel text size when
>> +	 * extending maps with vmalloc region. For this reason, here it
>> +	 * always adds kcore maps to the tail of list to make sure the
>> +	 * sequential handling is in order.
>> +	 */
>> +	list_add_tail(&list_node->node, &md->maps);
> 
> This seems reasonable, but I wonder if it might be robust
> and future proof to also process the main map first
> e.g. totally untested:
> 
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 9ebdb8e13c0b..63bce45a5abb 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1365,16 +1365,15 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
>  	if (!replacement_map)
>  		replacement_map = list_entry(md.maps.next, struct map_list_node, node)->map;
>  
> -	/* Add new maps */
> +	/* Add replacement_map */
>  	while (!list_empty(&md.maps)) {

That will need to be:

	struct map_list_node *new_node;
..
	list_for_each_entry(new_node, &md.maps, node) {


>  		struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
>  		struct map *new_map = new_node->map;
>  
> -		list_del_init(&new_node->node);
> -
>  		if (RC_CHK_EQUAL(new_map, replacement_map)) {
>  			struct map *map_ref;
>  
> +			list_del_init(&new_node->node);
>  			map__set_start(map, map__start(new_map));
>  			map__set_end(map, map__end(new_map));
>  			map__set_pgoff(map, map__pgoff(new_map));
> @@ -1385,20 +1384,29 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
>  			err = maps__insert(kmaps, map_ref);
>  			map__put(map_ref);
>  			map__put(new_map);
> +			free(new_node);
>  			if (err)
>  				goto out_err;
> -		} else {
> -			/*
> -			 * Merge kcore map into existing maps,
> -			 * and ensure that current maps (eBPF)
> -			 * stay intact.
> -			 */
> -			if (maps__merge_in(kmaps, new_map)) {
> -				err = -EINVAL;
> -				goto out_err;
> -			}
>  		}
> +	}
> +
> +	/* Add new maps */
> +	while (!list_empty(&md.maps)) {
> +		struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
> +		struct map *new_map = new_node->map;
> +
> +		list_del_init(&new_node->node);
> +
> +		/*
> +		 * Merge kcore map into existing maps,
> +		 * and ensure that current maps (eBPF)
> +		 * stay intact.
> +		 */
> +		if (maps__merge_in(kmaps, new_map))
> +			err = -EINVAL;
>  		free(new_node);
> +		if (err)
> +			goto out_err;
>  	}
>  
>  	if (machine__is(machine, "x86_64")) {
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ