linux-kernel - Re: [PATCH v5 17/40] x86/resctrl: Rewrite and move the for_each_*_rdt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6570a61-1608-4baa-9f61-fb87f39a76f6@arm.com>
Date: Fri, 18 Oct 2024 18:07:16 +0100
From: James Morse <james.morse@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
 Tony Luck <tony.luck@...el.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
 Fenghua Yu <fenghua.yu@...el.com>, Thomas Gleixner <tglx@...utronix.de>,
 Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
 H Peter Anvin <hpa@...or.com>, Babu Moger <Babu.Moger@....com>,
 shameerali.kolothum.thodi@...wei.com,
 D Scott Phillips OS <scott@...amperecomputing.com>,
 carl@...amperecomputing.com, lcherian@...vell.com,
 bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
 baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
 Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
 dfustini@...libre.com, amitsinght@...vell.com,
 David Hildenbrand <david@...hat.com>, Rex Nie <rex.nie@...uarmicro.com>,
 Dave Martin <dave.martin@....com>, Shaopeng Tan <tan.shaopeng@...fujitsu.com>
Subject: Re: [PATCH v5 17/40] x86/resctrl: Rewrite and move the
 for_each_*_rdt_resource() walkers

Hi Tony, Reinette,

On 08/10/2024 17:40, Reinette Chatre wrote:
> On 10/7/24 5:00 PM, Tony Luck wrote:
>> On Fri, Oct 04, 2024 at 06:03:24PM +0000, James Morse wrote:
>>> The for_each_*_rdt_resource() helpers walk the architecture's array
>>> of structures, using the resctrl visible part as an iterator. These
>>> became over-complex when the structures were split into a
>>> filesystem and architecture-specific struct. This approach avoided
>>> the need to touch every call site, and was done before there was a
>>> helper to retrieve a resource by rid.
>>>
>>> Once the filesystem parts of resctrl are moved to /fs/, both the
>>> architecture's resource array, and the definition of those structures
>>> is no longer accessible. To support resctrl, each architecture would
>>> have to provide equally complex macros.
>>>
>>> Rewrite the macro to make use of resctrl_arch_get_resource(), and
>>> move these to the core header so existing x86 arch code continues
>>> to use them.

>> Apologies if this comment was suggested against earlier versions
>> of this series.
>>
>> Did you consider replacing rdt_resources_all[] a list (in the filesystem
>> code) instead of an array (in the architecture code)?

I didn't consider this, but it would be a more natural fit for the secret for loops that
are all over the resctrl code.


>> List would start empty. Architecture init code would enumerate features
>> and add entries to the list for those that exist and are to be enabled.

That saves the 'can't return NULL' wart - but that was intended to be temporary - and only
a headache for !x86 architectures.


>> The "for_each" macros then walk the list (variants for all entries,
>> for "alloc_capable" and for "mon_capable"). Note that only enabled
>> entries appear on the lists.
>>
>> There are currently a bunch of places in filesystem code that
>> do:
>> 	r = resctrl_arch_get_resource(RDT_RESOURCE_MBA);
>> or
>> 	r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>>
>> those could become:
>>
>> 	r = resctrl_arch_get_mba_resource();
>>
>> 	r = resctrl_arch_get_l3_resource();

Where these walk this list instead of 'knowing' the offset.
(just in case I'm missing a trick here)


>> Then the whole "enum resctrl_res_level" and ->rid field in
>> struct rdt_resource could go away?

I think level is still going to be useful for cache resources - that is something we
expose via the sysfs cpu cache/indexX stuff too. I'd like resctrl to generate the names of
resources - just to ensure they are the same on every architecture.

The rid is an existing field just to make the array searching work.


>> Remaining uses look like
>> distinguishing MBA from SMBA. Perhaps better done with a
>> flags word?
>>
>> Advantage of doing this would be to avoid the generic
>> enum resctrl_res_level having to be a superset of all
>> features across all architectures.

Ah, I see this as an advantage - its much harder for an architecture to add a new type of
control or resource than it is to provide compatibility with one that is already there.
This in turn is better for user-space.

MPAM's bandwidth controls don't have the same control format as Intel RDT - but its much
better for everyone if I convert the values to hide the differences instead of trying to
shoehorn in ARM_MB as a new resource, only to find another architecture grows something
similar.

The difficult bit is making sure new resources/controls are as generic as possible,
meaning other architectures can adopt them. (L3's bitmap is a good example).

(and I agree there will always be platform specific things each camp has)

>> E.g. ARM might want to add L4/L5 resources,

/me shudders.

I've seen folk wanting to add the 'system cache' - which sits where the L3 should be, but
behaves differently. And ACPI's "Memory Side Caches" which gives me a hilarious TLA
collision to navigate).
I've argued neither of these are L<n> caches because they aren't visible to user-space in
/sys/devices/system/cpu/cpu0/cache ...

[..]

> Ideally resctrl fs would remain as an interface that a user can use to interact
> with all architectures without knowing architecture specific details. Platform
> differences can be exposed by resctrl in a generic way to support this.
> I am afraid that allowing architectures to diverge would require resctrl fs users
> to additionally know which platform they are running on.
> 
>> If this v5 series is close to being applied then I don't
>> want to derail with a re-write at this late stage.
>> All of this could be done as a cleanup after this series
>> has been applied.
> 
> Due to the already significant size of this work I think it would make it easier
> if the number of functional changes are minimal. Specifically, only those functional
> changes that are required to accomplish the goal of moving the code.

Yup - hence the need for !alloc_capable && !mon_capable resources behind
resctrl_arch_get_resource() - this is keeping the behaviour of the existing code.


> Considering that one goal of this proposal is to support architectural
> flexibility I do think it would be easier to understand its impact if it
> is implemented on top of the arch/fs split.

Make sense to me,


Thanks,

James