lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <201a4a5fe2e8429f94482ebabf1f90c1@huawei.com>
Date:   Mon, 9 Nov 2020 15:59:07 +0000
From:   Shiju Jose <shiju.jose@...wei.com>
To:     James Morse <james.morse@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "rjw@...ysocki.net" <rjw@...ysocki.net>,
        "lenb@...nel.org" <lenb@...nel.org>,
        "rrichter@...vell.com" <rrichter@...vell.com>
CC:     "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
        Linuxarm <linuxarm@...wei.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>
Subject: RE: [RFC PATCH 1/4] ACPI: PPTT: Fix for a high level cache node
 detected in the low level

Hi James,

Thanks for the feedback.

>-----Original Message-----
>From: James Morse [mailto:james.morse@....com]
>Sent: 06 November 2020 19:34
>To: Shiju Jose <shiju.jose@...wei.com>; linux-kernel@...r.kernel.org;
>bp@...en8.de; tony.luck@...el.com; rjw@...ysocki.net; lenb@...nel.org;
>rrichter@...vell.com
>Cc: linux-edac@...r.kernel.org; linux-acpi@...r.kernel.org; Linuxarm
><linuxarm@...wei.com>; Jonathan Cameron
><jonathan.cameron@...wei.com>
>Subject: Re: [RFC PATCH 1/4] ACPI: PPTT: Fix for a high level cache node
>detected in the low level
>
>Hi Shiju, Jonathan,
>
>On 05/11/2020 17:42, Shiju Jose wrote:
>> From: Jonathan Cameron <jonathan.cameron@...wei.com>
>>
>> According to the following sections of the PPTT definition in the ACPI
>> specification(V6.3), a high level cache node( For example L2 cache)
>> could be represented simultaneously both in the private resource of a
>> CPU node and via the next_level_of_cache pointer of a low level cache
>> node.
>> 1. Section 5.2.29.1 Processor hierarchy node structure (Type 0) "Each
>> processor node includes a list of resources that are private to that
>> node. Resources are described in other PPTT structures such as Type 1
>> cache structures. The processor node’s private resource list includes
>> a reference to each of the structures that represent private resources
>> to a given processor node. For example, an SoC level processor node
>> might contain two references, one pointing to a Level 3 cache resource
>> and another pointing to an ID structure."
>>
>> 2. Section 5.2.29.2 Cache Type Structure - Type 1
>>    Figure 5-26 Cache Type Structure - Type 1 Example
>
>'fix' in the subject makes me twitch ... is there a user-space visible bug
>because of this?
>
>
>> For the use case of creating EDAC device blocks for the CPU caches, we
>> need to search for cache node types in all levels using
>> acpi_find_cache_node(), as a platform independent solution to
>
>I'm nervous to base the edac user-space view of caches on something other
>than what is described in /sys/devices/system/cpu/cpu0/cache. These things
>have to match, otherwise user-space can't work out which cpu's L2's it should
>add to get the value for the physical cache.
With this fix the /sys/devices/system/edac/cpu/cpu0/cacheN match with  /sys/devices/system/cpu/cpu0/cache/indexN.
and thus user-space could extract cpu list for the shared caches.

>
>Getting the data from somewhere else risks making this more complicated.
>
>Using the PPTT means this won't work on "HPE Server"s that use ghes_edac
>too. I don't think we should have any arm64 specific behaviour here.
>
>
>> retrieve the cache info from the ACPI PPTT. The reason is that
>> cacheinfo in the drivers/base/cacheinfo.c would not be populated in
>> this stage.
>
>Because both ghes_init() and cacheinfo_sysfs_init() are device_initcall()?
>
>Couldn't we fix this by making ghes_init(), device_initcall_sync() (with a
>comment saying what it depends on)

I checked by making ghes_init(), device_initcall_sync(). Then the
ghes_probe() and ghes_edac_register() are getting
called after detect_cache_attributes() of base/cacheinfo.c function is called, 
where per_cpu_cacheinfo is allocated and populated for the cpu online. 
Thus cacheinfo data is available for the online CPUs.

>
>
>I agree this means dealing with cpuhp as the cacheinfo data is only available
>for online CPUs.

Does this require the EDAC device instance for a cpu become online/offline to be added/deleted
on cpuhp notify functions in the ghes_edac because the cache structure among CPU cores would vary?
If so, I think edac device does not support dynamic addition/deletion of a device instance because
edac_device_alloc_ctl_info() pre-allocates memory for the internal edac dev structures for the number of 
instances(number of CPUs) and number of blocks(number of Caches) passed?

>
>
>> In this case, we found acpi_find_cache_node() mistakenly detecting
>> high level cache as low level cache, when the cache node is in the
>> processor node’s private resource list.
>>
>> To fix this issue add duplication check in the
>> acpi_find_cache_level(), for a cache node found in the private
>> resource of a CPU node with all the next level of caches present in the other
>cache nodes.
>
>I'm not overly familiar with the PPTT, is it possible this issue is visible in
>/sys/devices/system/cpu/cpu0/cache?
>
>
>Thanks,
>
>James
>
>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index
>> 4ae93350b70d..de1dd605d3ad 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -132,21 +132,80 @@ static unsigned int acpi_pptt_walk_cache(struct
>acpi_table_header *table_hdr,
>>  	return local_level;
>>  }
>>
>> +/**
>> + * acpi_pptt_walk_check_duplicate() - Find the cache resource to
>> +check,
>> + * is a duplication in the next_level_of_cache pointer of other cache.
>> + * @table_hdr: Pointer to the head of the PPTT table
>> + * @res: cache resource in the PPTT we want to walk
>> + * @res_check: cache resource in the PPTT we want to check for
>duplication.
>> + *
>> + * Given both PPTT resource, verify that they are cache nodes, then
>> +walk
>> + * down each level of cache @res, and check for the duplication.
>> + *
>> + * Return: true if duplication found, false otherwise.
>> + */
>> +static bool acpi_pptt_walk_check_duplicate(struct acpi_table_header
>*table_hdr,
>> +					   struct acpi_subtable_header *res,
>> +					   struct acpi_subtable_header
>*res_check) {
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_pptt_cache *check;
>> +
>> +	if (res->type != ACPI_PPTT_TYPE_CACHE ||
>> +	    res_check->type != ACPI_PPTT_TYPE_CACHE)
>> +		return false;
>> +
>> +	cache = (struct acpi_pptt_cache *)res;
>> +	check = (struct acpi_pptt_cache *)res_check;
>> +	while (cache) {
>> +		if (cache == check)
>> +			return true;
>> +		cache = fetch_pptt_cache(table_hdr, cache-
>>next_level_of_cache);
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  static struct acpi_pptt_cache *
>>  acpi_find_cache_level(struct acpi_table_header *table_hdr,
>>  		      struct acpi_pptt_processor *cpu_node,
>>  		      unsigned int *starting_level, unsigned int level,
>>  		      int type)
>>  {
>> -	struct acpi_subtable_header *res;
>> +	struct acpi_subtable_header *res, *res2;
>>  	unsigned int number_of_levels = *starting_level;
>>  	int resource = 0;
>> +	int resource2 = 0;
>> +	bool duplicate = false;
>>  	struct acpi_pptt_cache *ret = NULL;
>>  	unsigned int local_level;
>>
>>  	/* walk down from processor node */
>>  	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node,
>resource))) {
>>  		resource++;
>> +		/*
>> +		 * PPTT definition in the ACPI specification allows a high level
>cache
>> +		 * node would be represented simultaneously both in the
>private resource
>> +		 * of a CPU node and via the next_level_of_cache pointer of
>another cache node,
>> +		 * within the same CPU hierarchy. This resulting
>acpi_find_cache_level()
>> +		 * mistakenly detects a higher level cache node in the low
>level as well.
>> +		 *
>> +		 * Check a cache node in the private resource of the CPU node
>for any
>> +		 * duplication.
>> +		 */
>> +		resource2 = 0;
>> +		duplicate = false;
>> +		while ((res2 = acpi_get_pptt_resource(table_hdr, cpu_node,
>resource2))) {
>> +			resource2++;
>> +			if (res2 == res)
>> +				continue;
>> +			if (acpi_pptt_walk_check_duplicate(table_hdr, res2,
>res)) {
>> +				duplicate = true;
>> +				break;
>> +			}
>> +		}
>> +		if (duplicate)
>> +			continue;
>>
>>  		local_level = acpi_pptt_walk_cache(table_hdr,
>*starting_level,
>>  						   res, &ret, level, type);
>>

Thanks,
Shiju

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ