[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <047c0f7b-e46d-4a3f-8bc0-ce007eac36a7@arm.com>
Date: Wed, 17 Sep 2025 19:58:48 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Yang Shi <yang@...amperecomputing.com>, Dev Jain <dev.jain@....com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Ard Biesheuvel <ardb@...nel.org>, scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH v7 0/6] arm64: support FEAT_BBM level 2 and large block
mapping when rodata=full
On 17/09/2025 18:21, Yang Shi wrote:
>
>
> On 9/17/25 9:28 AM, Ryan Roberts wrote:
>> Hi Yang,
>>
>> Sorry for the slow reply; I'm just getting back to this...
>>
>> On 11/09/2025 23:03, Yang Shi wrote:
>>> Hi Ryan & Catalin,
>>>
>>> Any more concerns about this?
>> I've been trying to convince myself that your assertion that all users that set
>> the VM_FLUSH_RESET_PERMS also call set_memory_*() for the entire range that was
>> returned my vmalloc. I agree that if that is the contract and everyone is
>> following it, then there is no problem here.
>>
>> But I haven't been able to convince myself...
>>
>> Some examples (these might intersect with examples you previously raised):
>>
>> 1. bpf_dispatcher_change_prog() -> bpf_jit_alloc_exec() -> execmem_alloc() ->
>> sets VM_FLUSH_RESET_PERMS. But I don't see it calling set_memory_*() for
>> rw_image.
>
> Yes, it doesn't call set_memory_*(). I spotted this in the earlier email. But it
> is actually RW, so it should be ok to miss the call. The later
> set_direct_map_invalid call in vfree() may fail, but set_direct_map_default call
> will set RW permission back. But I think it doesn't have to use execmem_alloc(),
> the plain vmalloc() should be good enough.
>
>>
>> 2. module_memory_alloc() -> execmem_alloc_rw() -> execmem_alloc() -> sets
>> VM_FLUSH_RESET_PERMS (note that execmem_force_rw() is nop for arm64).
>> set_memory_*() is not called until much later on in module_set_memory(). Another
>> error in the meantime could cause the memory to be vfreed before that point.
>
> IIUC, execmem_alloc_rw() is used to allocate memory for modules' text section
> and data section. The code will set mod->mem[type].is_rox according to the type
> of the section. It is true for text, false for data. Then set_memory_rox() will
> be called later if it is true *after* insns are copied to the memory. So it is
> still RW before that point.
>
>>
>> 3. When set_vm_flush_reset_perms() is set for the range, it is called before
>> set_memory_*() which might then fail to split prior to vfree.
>
> Yes, all call sites check the return value and bail out if set_memory_*() failed
> if I don't miss anything.
>
>>
>> But I guess as long as set_memory_*() is never successfully called for a
>> *sub-range* of the vmalloc'ed region, then for all of the above issues, the
>> memory must still be RW at vfree-time, so this issue should be benign... I think?
>
> Yes, it is true.
So to summarise, all freshly vmalloc'ed memory starts as RW. set_memory_*() may
only be called if VM_FLUSH_RESET_PERMS has already been set. If set_memory_*()
is called at all, the first call MUST be for the whole range.
If those requirements are all met, then if VM_FLUSH_RESET_PERMS was set but
set_memory_*() was never called, the worst that can happen is for both the
set_direct_map_invalid() and set_direct_map_default() calls to fail due to not
enough memory. But that is safe because the memory was always RW. If
set_memory_*() was called for the whole range and failed, it's the same as if it
was never called. If it was called for the whole range and succeeded, then the
split must have happened already and set_direct_map_invalid() and
set_direct_map_default() will therefore definitely succeed.
The only way this could be a problem is if someone vmallocs a range then
performs a set_memory_*() on a sub-region without having first done it for the
whole region. But we have not found any evidence that there are any users that
do that.
In fact, by that logic, I think alloc_insn_page() must also be safe; it only
allocates 1 page, so if set_memory_*() is subsequently called for it, it must by
definition be covering the whole allocation; 1 page is the smallest amount that
can be protected.
So I agree we are safe.
>
>>
>> In summary this all looks horribly fragile. But I *think* it works. It would be
>> good to clean it all up and have some clearly documented rules regardless. But I
>> think that could be a follow up series.
>
> Yeah, absolutely agreed.
>
>>
>>> Shall we move forward with v8?
>> Yes; Do you wnat me to post that or would you prefer to do it? I'm happy to do
>> it; there are a few other tidy ups in pageattr.c I want to make which I spotted.
>
> I actually just had v8 ready in my tree. I removed pageattr_pgd_entry and
> pageattr_pud_entry in pageattr.c and fixed pmd_leaf/pud_leaf as you suggested.
> Is it the cleanup you are supposed to do?
I was also going to fix up the comment in change_memory_common() which is now stale.
> And I also rebased it on top of
> Shijie's series (https://git.kernel.org/pub/scm/linux/kernel/git/arm64/
> linux.git/commit/?id=bfbbb0d3215f) which has been picked up by Will.
>
>>
>>> We can include the
>>> fix to kprobes in v8 or I can send it separately, either is fine to me.
>> Post it on list, and I'll also incorporate into the series.
>
> I can include it in v8 series.
>
>>
>>> Hopefully we can make v6.18.
>> It's probably getting a bit late now. Anyway, I'll aim to get v8 out tomorrow or
>> Friday and we will see what Will thinks.
>
> Thank you. I can post v8 today.
OK great - I'll leave it all to you then - thanks!
>
> Thanks,
> Yang
>
>>
>> Thanks,
>> Ryan
>>
>>> Thanks,
>>> Yang
>>>
>
Powered by blists - more mailing lists