linux-kernel - Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc4881e8-5d90-4992-8cbf-650ea2efa5ca@arm.com>
Date: Thu, 28 Aug 2025 16:58:05 +0100
From: James Morse <james.morse@....com>
To: Dave Martin <Dave.Martin@....com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 linux-acpi@...r.kernel.org, devicetree@...r.kernel.org,
 shameerali.kolothum.thodi@...wei.com,
 D Scott Phillips OS <scott@...amperecomputing.com>,
 carl@...amperecomputing.com, lcherian@...vell.com,
 bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
 baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
 Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
 dfustini@...libre.com, amitsinght@...vell.com,
 David Hildenbrand <david@...hat.com>, Rex Nie <rex.nie@...uarmicro.com>,
 Koba Ko <kobak@...dia.com>, Shanker Donthineni <sdonthineni@...dia.com>,
 fenghuay@...dia.com, baisheng.gao@...soc.com,
 Jonathan Cameron <jonathan.cameron@...wei.com>, Rob Herring
 <robh@...nel.org>, Rohit Mathew <rohit.mathew@....com>,
 Rafael Wysocki <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
 Lorenzo Pieralisi <lpieralisi@...nel.org>, Hanjun Guo
 <guohanjun@...wei.com>, Sudeep Holla <sudeep.holla@....com>,
 Krzysztof Kozlowski <krzk+dt@...nel.org>, Conor Dooley
 <conor+dt@...nel.org>, Catalin Marinas <catalin.marinas@....com>,
 Will Deacon <will@...nel.org>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Danilo Krummrich <dakr@...nel.org>
Subject: Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Hi Dave,

On 27/08/2025 11:50, Dave Martin wrote:
> Hi,
> 
> On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>> Add a cleanup based free-ing mechanism for acpi_get_table().

> Does this mean that the early secondaries must be spread out across the
> whole topology so that everything can be probed?
>
> (i.e., a random subset is no good?)

For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
associated with each cache needs to be online. Random is fine - provided you get lucky.


> If so, is this documented somewhere, such as in booting.rst?

booting.rst is for the bootloader.
Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
make much of a difference.

The concern is that if resctrl gets probed after user-space has started, whatever
user-space service is supposed to set it up will have concluded its not supported. Working
with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
enough so that every piece of hardware is reachable.


> Maybe this is not a new requirement -- it's not an area that I'm very
> familiar with.

Hard to say - its a potentially surprising side effect of glomming OS accessible registers
onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).

I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
change while they are offline. (presumably due to RAS or a firmware update)


>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 8f9b9508acba..660457644a5b 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>>  					  ACPI_PPTT_ACPI_IDENTICAL);
>>  }
>> +
>> +/**
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the unified cache
>> + *
>> + * Determine the level relative to any CPU for the unified cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group unified caches that are peers.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * If one CPUs L2 is shared with another as L3, this function will return
>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> 
> Nit: doesn't exist or its revision is too old.

... its not old, but there is no published spec for that revision... unsupported?


>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */
>> +int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);

> acpi_get_pptt() ? (See comment on patch 3.)

Yup,

> Comments there also suggest that the acpi_put_table() may be
> unnecessary, at least on some paths.
> 
> I haven't tried to understand the ins and outs of this.

It's grabbing one reference and using it for everything, because it needs to 'map' the
table in atomic context due to cpuhp, but can't.
Given how frequently its used, there is no problem just leaving it mapped.


>> +
>> +	if (IS_ERR(table))
>> +		return PTR_ERR(table);
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	/*
>> +	 * If we found the cache first, we'd still need to walk from each CPU
>> +	 * to find the level...
>> +	 */

> ^ Possibly confusing comment?  The cache id is the starting point for
> calling this function.  Is there a world in which we are at this point
> without first having found the cache node?
> 
> (If the comment is just a restatement of part of the kerneldoc
> description, maybe just drop it.)

It's describing the alternate world where the table is searched to find the cache first,
but then we'd still need to walk the table another NR_CPUs times, which can't be avoided.
I'll drop it - it was justifying why its done this way round...


>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			return -ENOENT;
>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> 
> Is the initial call to acpi_count_levels() really needed here?
> 
> It feels a bit like we end up enumerating the whole topology two or
> three times here; once to count how many levels there are, and then
> again to examine the nodes, and once more inside acpi_find_cache_node().
> 
> Why can't we just walk until we run out of levels?

This is looking for a unified cache - and we don't know where those start.
We could walk the first 100 caches, and stop once we start getting unified caches, then
they stop again ... but this seemed simpler.


> I may be missing some details of how these functions interact -- if
> this is only run at probe time, compact, well-factored code is
> more important than making things as fast as possible.


>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..30c10b1dcdb2 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
> 
> [...]
> 
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
>>  
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> +	struct acpi_table_header *table;
>> +	int status = acpi_get_table(signature, instance, &table);
>> +
>> +	if (ACPI_FAILURE(status))
>> +		return ERR_PTR(-ENOENT);
>> +	return table;
>> +}

> This feels like something that ought to exist already.  If not, why
> not?  If so, are there open-coded versions of this spread around the
> ACPI tree that should be ported to use it?


It's a cleanup idiom helper that lets the compiler do this automagically - but its moot as
its not going to be needed in the pptt because of the acpi_get_pptt() thing.


Thanks,

James