lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c644070-e4be-43c1-acb3-30cd030e20e1@os.amperecomputing.com>
Date: Thu, 13 Feb 2025 13:14:55 -0800
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, catalin.marinas@....com,
 will@...nel.org
Cc: cl@...two.org, scott@...amperecomputing.com,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH 1/2] arm64: cpufeature: detect FEAT_BBM level 2




On 2/11/25 3:55 AM, Ryan Roberts wrote:

Hi Ryan,

Thanks for taking time to review the patches.

> On 03/01/2025 01:17, Yang Shi wrote:
>> FEAT_BBM level 2 allow changing block size of a translation with relaxed
>> TLB flushing.  But it may incur TLB conflict abort.  We can handle the
>> abort in kernel, however it is hard to guarantee the recuesive TLB
> nit: recuesive -> recursive ?

Yes, it is a typo. Will fix in the next version.

>> conflct will never happen in the handling itself.
>>
>> Some implementations can handle TLB conflict gracefully without fault
>> handler in kernel so FEAT_BBM level 2 can be enabled on those
>> implementations safely.
>>
>> Look up MIDR to filter out those CPUs.  AmpereOne is one of them.
>>
>> Suggested-by: Will Deacon<will@...nel.org>
>> Signed-off-by: Yang Shi<yang@...amperecomputing.com>
>> ---
>>   arch/arm64/include/asm/cpufeature.h | 19 +++++++++++++++++++
>>   arch/arm64/kernel/cpufeature.c      | 11 +++++++++++
>>   arch/arm64/tools/cpucaps            |  1 +
>>   3 files changed, 31 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 8b4e5a3cd24c..33ca9db42741 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -866,6 +866,25 @@ static __always_inline bool system_supports_mpam_hcr(void)
>>   	return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
>>   }
>>   
>> +static inline bool system_supports_bbmlv2(void)
> nit: Arm language internally is starting to refer to FEAT_BBML1 / FEAT_BBML2 and
> I believe this will soon make it's way to the Arm ARM. So probably better to
> refer to bbml2 rather than bbmlv2 throughout.

Sure.

>> +{
>> +	return cpus_have_final_boot_cap(ARM64_HAS_BBMLV2);
>> +}
>> +
>> +static inline bool bbmlv2_available(void)
> This function has no need to be in the header. system_supports_bbmlv2() is what
> users should use. Suggest moving to has_bbmlv2() in cpufeature.c.

bbmlv2_available() will be called by map_mem() in patch 2, but map_mem() 
is called before CPU feature is finalized. I saw you suggest collapse 
the page table in the below comment, if it works we don't need this 
function anymore. But I have more questions about that.

>> +{
>> +	static const struct midr_range support_bbmlv2[] = {
>> +		MIDR_ALL_VERSIONS(MIDR_AMPERE1),
>> +		MIDR_ALL_VERSIONS(MIDR_AMPERE1A),
>> +		{}
>> +	};
>> +
>> +	if (is_midr_in_range_list(read_cpuid_id(), support_bbmlv2))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>   int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>>   bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>>   
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 6ce71f444ed8..a60d5fa04828 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1889,6 +1889,11 @@ static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
>>   }
>>   #endif
>>   
>> +static bool has_bbmlv2(const struct arm64_cpu_capabilities *entry, int scope)
>> +{
>> +	return bbmlv2_available();
>> +}
>> +
>>   #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
>>   #define KPTI_NG_TEMP_VA		(-(1UL << PMD_SHIFT))
>>   
>> @@ -2990,6 +2995,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>>   		ARM64_CPUID_FIELDS(ID_AA64PFR1_EL1, GCS, IMP)
>>   	},
>>   #endif
>> +	{
>> +		.desc = "BBM Level 2",
>> +		.capability = ARM64_HAS_BBMLV2,
>> +		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
> I'm wondering if this will potentially lead to problems for assymetric
> collections of CPUs (e.g. big.LITTLE)? I can imagine that little CPUs might not
> support BBML2. In this case if you boot on a big CPU that does have BBML2, you
> will require the feature and refuse to online the secondary little CPUs.

Yes. This is the behavior of this patch.

> Perhaps this really needs to be a system feature, where it is only enabled if
> all CPUs in the system support it? I'm guessing that will make painting the
> linear map harder; I guess you will need to initially set it up with PTE
> mappings, then repaint as block mappings if BBML2 is determined to be supported
> if that's not already what you are doing.

Actually I thought about this before I posted the RFC patches to 
upstream. There are a couple of options, but I can't tell which one is 
the preferred and whether it is really that important to handle 
asymmetric systems gracefully or not, so I did it in the simplest way: 
just fail online the conflict cores. I also noticed some features behave 
similarly, for example, MPAM. And this RFC patch is mainly aimed to get 
some feedback from the community about whether it is worth it and the 
direction is right or not. So I tried to make it as simple as possible 
(for example, I didn't add CONT_PTE support in patch 2 either).

If I understand correctly, system feature needs to read the "sanitized" 
register value per the comment in cpufeature.c, but we read MIDR here. 
So it actually just uses the current CPU's (likely boot CPU) MIDR if it 
is s system feature, right? If we really want to handle such asymmetric 
systems gracefully, we need:
     - read all cores' MIDR then determine whether BBML2 should be 
advertised or not
     - update a flag or bitmap to tell us whether it is asymmetric or not
     - take actions based on the flag or bitmap (i.e. collapse page 
table or do nothing)

But system feature is not checked on the secondary cores. The 
check_local_cpu_capabilities() called by secondary_start_kernel() just 
checks SCOPE_LOCAL_CPU features if I read the code correctly. So local 
cpu feature may be better? The local cpu feature maintains a cpumask, it 
can tell us whether BBML2 is asymmetric or not.

In addition I'm also thinking about whether collapse is the best way or 
not. We should be able to have large block mapping in the first place if 
the boot CPU has BBML2, then split the page table if it is asymmetric. 
I'm supposed we need to stop machine anyway even though we do collapse. 
The split need to be called on the boot CPU. We already have split 
logic, we can reuse it anyway (maybe need some minor tweak to fit). It 
sounds simpler than collapse. And the asymmetric systems may be not that 
many in real world? I know there are a lot of big.LITTLE SoCs in the 
wild, but those big cores may typically not support BBML2. If so we can 
save boot up time for the most cases.

The other concern is about cpu hotplug. For example, if all the booting 
cores have BBML2, but the hot plugged cores don't, shall we split the 
page table when the cores are hot added, and collapse the page table 
when the cores are hot removed?

I'm not sure whether the extra logic to support asymmetric systems is 
worth it or not. Maybe we should start from the symmetric systems, then 
add more graceful handle to asymmetric systems later if it turns out to 
be a real problem? And unfortunately I don't have the appropriate 
hardware to test the code. Maybe you or someone else from ARM has the 
right hardware?

Thanks,
Yang

> Thanks,
> Ryan
>
>> +		.matches = has_bbmlv2,
>> +	},
>>   	{},
>>   };
>>   
>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>> index eb17f59e543c..287bdede53f5 100644
>> --- a/arch/arm64/tools/cpucaps
>> +++ b/arch/arm64/tools/cpucaps
>> @@ -14,6 +14,7 @@ HAS_ADDRESS_AUTH_ARCH_QARMA5
>>   HAS_ADDRESS_AUTH_IMP_DEF
>>   HAS_AMU_EXTN
>>   HAS_ARMv8_4_TTL
>> +HAS_BBMLV2
>>   HAS_CACHE_DIC
>>   HAS_CACHE_IDC
>>   HAS_CNP


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ