linux-kernel - Re: [v3 PATCH 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <12f987c5-0e7f-4930-945f-bf7a2e73f5c2@os.amperecomputing.com>
Date: Fri, 30 May 2025 14:21:11 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, will@...nel.org,
 catalin.marinas@....com, Miko.Lenczewski@....com,
 scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 Dev Jain <dev.jain@....com>
Subject: Re: [v3 PATCH 0/6] arm64: support FEAT_BBM level 2 and large block
 mapping when rodata=full



On 5/30/25 12:17 AM, Ryan Roberts wrote:
> On 29/05/2025 20:52, Yang Shi wrote:
>>>>> I just had another conversation about this internally, and there is another
>>>>> concern; we obviously don't want to modify the pgtables while other CPUs that
>>>>> don't support BBML2 could be accessing them. Even in stop_machine() this may be
>>>>> possible if the CPU stacks and task structure (for example) are allocated
>>>>> out of
>>>>> the linear map.
>>>>>
>>>>> So we need to be careful to follow the pattern used by kpti; all secondary CPUs
>>>>> need to switch to the idmap (which is installed in TTBR0) then install the
>>>>> reserved map in TTBR1, then wait for CPU 0 to repaint the linear map, then have
>>>>> the secondary CPUs switch TTBR1 back to swapper then switch back out of idmap.
>>>> So the below code should be ok?
>>>>
>>>> cpu_install_idmap()
>>>> Busy loop to wait for cpu 0 done
>>>> cpu_uninstall_idmap()
>>> Once you have installed the idmap, you'll need to call a function by its PA so
>>> you are actually executing out of the idmap. And you will need to be in assembly
>>> so you don't need the stack, and you'll need to switch TTBR1 to the reserved
>>> pgtable, so that the CPU has no access to the swapper pgtable (which CPU 0 is
>>> able to modify).
>>>
>>> You may well be able to reuse __idmap_kpti_secondary in proc.S, or lightly
>>> refactor it to work for both the existing idmap_kpti_install_ng_mappings case,
>>> and your case.
>> I'm wondering whether we really need idmap for repainting. I think repainting is
>> different from kpti. We just split linear map which is *not* used by kernel
>> itself, the mappings for kernel itself is intact, we don't touch it at all. So
>> as long as CPU 0 will not repaint the linear map until all other CPUs busy
>> looping in stop_machine fn, then we are fine.
> But *how* are the other CPUs busy looping? Are they polling a variable? Where
> does that variable live? The docs say that a high priority thread is run for
> each CPU. So there at least needs to be a task struct and a stack. There are
> some Kconfigs where the stack comes from the linear map, so if the variable that
> is polls is on its stack (or even on CPU 0's stack then that's a problem. If the
> scheduler runs and accesses the task struct which may be allocated from the
> linear map (e.g. via kmalloc), that's a problem.
>
> The point is that you have to understand all the details of stop_machine() to be
> confident that it is never accessing the linear map. And even if you can prove
> that today, there is nothing stopping from the implementation changing in future.
>
> But then you have non-architectural memory accesses too (i.e. speculative
> accesses). It's possible that the CPU does a speculative load, which causes the
> TLB to do a translation and cache a TLB entry to the linear map. Then CPU 0
> changes the pgtable and you have broken the BBM requirements from the secondary
> CPU's perspective.
>
> So personally I think the only truely safe way to solve this is to switch the
> secondary CPUs to the idmap, then install the reserved map in TTBR1. That way,
> the secondary CPUs can't see the swapper pgtable at all and CPU 0 is free to do
> what it likes.

OK, I agree it is safer to run the busy loop (wait for repainting done 
on boot CPU) in idmap address space.

IIUC I should just need map the flag polled by the secondary CPU in 
idmap so that both CPU 0 and secondary CPUs can access it. And have the 
wait function in .idmap.text section. I may not reuse kpti code because 
it is much simpler than kpti.

Thanks,
Yang

>
>> We can have two flags to control it. The first one should be a cpu mask, all
>> secondary CPUs set its own mask bit to tell CPU 0 it is in stop machine fn
>> (ready for repainting). The other flag is used by CPU 0 to tell all secondary
>> CPUs repainting is done, please resume. We need have the two flags in kernel
>> data section instead of stack.
>>
>> The code of fn is in kernel text section, the flags are in kernel data section.
>> I don't see how come fn (just doing simple busy loop) on secondary CPUs need to
>> access linear map while repainting the linear map. After repainting the TLB will
>> be flushed before letting secondary CPUs resume, so any access to linear map
>> address after that point should be safe too.
>>
>> Does it sound reasonable to you? Did I miss something?
> I think the potential for speculative access is the problem. Personally, I would
> follow the pattern laid out by kpti. Then you can more easily defend it by
> pointing to an established pattern.
>
> Thanks,
> Ryan
>
>> Thanks,
>> Yang
>>
>>> Thanks,
>>> Ryan
>>>
>>>>> Given CPU 0 supports BBML2, I think it can just update the linear map live,
>>>>> without needing to do the idmap dance?
>>>> Yes, I think so too.
>>>>
>>>> Thanks,
>>>> Yang
>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Ryan
>>>>>>