linux-kernel - Re: [v3 PATCH 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1cc04c6b-ba0f-4e7f-ab85-46c364c66300@arm.com>
Date: Fri, 30 May 2025 08:17:13 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Yang Shi <yang@...amperecomputing.com>, will@...nel.org,
 catalin.marinas@....com, Miko.Lenczewski@....com,
 scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 Dev Jain <dev.jain@....com>
Subject: Re: [v3 PATCH 0/6] arm64: support FEAT_BBM level 2 and large block
 mapping when rodata=full

On 29/05/2025 20:52, Yang Shi wrote:
>>>> I just had another conversation about this internally, and there is another
>>>> concern; we obviously don't want to modify the pgtables while other CPUs that
>>>> don't support BBML2 could be accessing them. Even in stop_machine() this may be
>>>> possible if the CPU stacks and task structure (for example) are allocated
>>>> out of
>>>> the linear map.
>>>>
>>>> So we need to be careful to follow the pattern used by kpti; all secondary CPUs
>>>> need to switch to the idmap (which is installed in TTBR0) then install the
>>>> reserved map in TTBR1, then wait for CPU 0 to repaint the linear map, then have
>>>> the secondary CPUs switch TTBR1 back to swapper then switch back out of idmap.
>>> So the below code should be ok?
>>>
>>> cpu_install_idmap()
>>> Busy loop to wait for cpu 0 done
>>> cpu_uninstall_idmap()
>> Once you have installed the idmap, you'll need to call a function by its PA so
>> you are actually executing out of the idmap. And you will need to be in assembly
>> so you don't need the stack, and you'll need to switch TTBR1 to the reserved
>> pgtable, so that the CPU has no access to the swapper pgtable (which CPU 0 is
>> able to modify).
>>
>> You may well be able to reuse __idmap_kpti_secondary in proc.S, or lightly
>> refactor it to work for both the existing idmap_kpti_install_ng_mappings case,
>> and your case.
> 
> I'm wondering whether we really need idmap for repainting. I think repainting is
> different from kpti. We just split linear map which is *not* used by kernel
> itself, the mappings for kernel itself is intact, we don't touch it at all. So
> as long as CPU 0 will not repaint the linear map until all other CPUs busy
> looping in stop_machine fn, then we are fine.

But *how* are the other CPUs busy looping? Are they polling a variable? Where
does that variable live? The docs say that a high priority thread is run for
each CPU. So there at least needs to be a task struct and a stack. There are
some Kconfigs where the stack comes from the linear map, so if the variable that
is polls is on its stack (or even on CPU 0's stack then that's a problem. If the
scheduler runs and accesses the task struct which may be allocated from the
linear map (e.g. via kmalloc), that's a problem.

The point is that you have to understand all the details of stop_machine() to be
confident that it is never accessing the linear map. And even if you can prove
that today, there is nothing stopping from the implementation changing in future.

But then you have non-architectural memory accesses too (i.e. speculative
accesses). It's possible that the CPU does a speculative load, which causes the
TLB to do a translation and cache a TLB entry to the linear map. Then CPU 0
changes the pgtable and you have broken the BBM requirements from the secondary
CPU's perspective.

So personally I think the only truely safe way to solve this is to switch the
secondary CPUs to the idmap, then install the reserved map in TTBR1. That way,
the secondary CPUs can't see the swapper pgtable at all and CPU 0 is free to do
what it likes.

> 
> We can have two flags to control it. The first one should be a cpu mask, all
> secondary CPUs set its own mask bit to tell CPU 0 it is in stop machine fn
> (ready for repainting). The other flag is used by CPU 0 to tell all secondary
> CPUs repainting is done, please resume. We need have the two flags in kernel
> data section instead of stack.
> 
> The code of fn is in kernel text section, the flags are in kernel data section.
> I don't see how come fn (just doing simple busy loop) on secondary CPUs need to
> access linear map while repainting the linear map. After repainting the TLB will
> be flushed before letting secondary CPUs resume, so any access to linear map
> address after that point should be safe too.
> 
> Does it sound reasonable to you? Did I miss something?

I think the potential for speculative access is the problem. Personally, I would
follow the pattern laid out by kpti. Then you can more easily defend it by
pointing to an established pattern.

Thanks,
Ryan

> 
> Thanks,
> Yang
> 
>>
>> Thanks,
>> Ryan
>>
>>>> Given CPU 0 supports BBML2, I think it can just update the linear map live,
>>>> without needing to do the idmap dance?
>>> Yes, I think so too.
>>>
>>> Thanks,
>>> Yang
>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>