linux-kernel - Re: [PATCH] arm64/pageattr: Propagate return value from __change_memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d660898-241b-450f-a3f0-3d09a1c97ae0@arm.com>
Date: Wed, 12 Nov 2025 11:29:48 +0530
From: Dev Jain <dev.jain@....com>
To: Yang Shi <yang@...amperecomputing.com>, Will Deacon <will@...nel.org>
Cc: catalin.marinas@....com, ryan.roberts@....com, rppt@...nel.org,
 shijie@...amperecomputing.com, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] arm64/pageattr: Propagate return value from
 __change_memory_common


On 11/11/25 10:07 am, Dev Jain wrote:
>
> On 11/11/25 9:47 am, Yang Shi wrote:
>>
>>
>> On 11/10/25 7:39 PM, Dev Jain wrote:
>>>
>>> On 05/11/25 9:27 am, Dev Jain wrote:
>>>>
>>>> On 04/11/25 6:26 pm, Will Deacon wrote:
>>>>> On Tue, Nov 04, 2025 at 09:06:12AM +0530, Dev Jain wrote:
>>>>>> On 04/11/25 12:15 am, Yang Shi wrote:
>>>>>>> On 11/3/25 7:16 AM, Will Deacon wrote:
>>>>>>>> On Mon, Nov 03, 2025 at 11:43:06AM +0530, Dev Jain wrote:
>>>>>>>>> Post a166563e7ec3 ("arm64: mm: support large block mapping when
>>>>>>>>> rodata=full"),
>>>>>>>>> __change_memory_common has a real chance of failing due to split
>>>>>>>>> failure.
>>>>>>>>> Before that commit, this line was introduced in c55191e96caa,
>>>>>>>>> still having
>>>>>>>>> a chance of failing if it needs to allocate pagetable memory in
>>>>>>>>> apply_to_page_range, although that has never been observed to 
>>>>>>>>> be true.
>>>>>>>>> In general, we should always propagate the return value to the 
>>>>>>>>> caller.
>>>>>>>>>
>>>>>>>>> Cc: stable@...r.kernel.org
>>>>>>>>> Fixes: c55191e96caa ("arm64: mm: apply r/o permissions of VM
>>>>>>>>> areas to its linear alias as well")
>>>>>>>>> Signed-off-by: Dev Jain <dev.jain@....com>
>>>>>>>>> ---
>>>>>>>>> Based on Linux 6.18-rc4.
>>>>>>>>>
>>>>>>>>>    arch/arm64/mm/pageattr.c | 5 ++++-
>>>>>>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>>>>>>>>> index 5135f2d66958..b4ea86cd3a71 100644
>>>>>>>>> --- a/arch/arm64/mm/pageattr.c
>>>>>>>>> +++ b/arch/arm64/mm/pageattr.c
>>>>>>>>> @@ -148,6 +148,7 @@ static int change_memory_common(unsigned
>>>>>>>>> long addr, int numpages,
>>>>>>>>>        unsigned long size = PAGE_SIZE * numpages;
>>>>>>>>>        unsigned long end = start + size;
>>>>>>>>>        struct vm_struct *area;
>>>>>>>>> +    int ret;
>>>>>>>>>        int i;
>>>>>>>>>          if (!PAGE_ALIGNED(addr)) {
>>>>>>>>> @@ -185,8 +186,10 @@ static int change_memory_common(unsigned
>>>>>>>>> long addr, int numpages,
>>>>>>>>>        if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
>>>>>>>>>                    pgprot_val(clear_mask) == PTE_RDONLY)) {
>>>>>>>>>            for (i = 0; i < area->nr_pages; i++) {
>>>>>>>>> - __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>> +            ret =
>>>>>>>>> __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>>                               PAGE_SIZE, set_mask, clear_mask);
>>>>>>>>> +            if (ret)
>>>>>>>>> +                return ret;
>>>>>>>> Hmm, this means we can return failure half-way through the 
>>>>>>>> operation. Is
>>>>>>>> that something callers are expecting to handle? If so, how can 
>>>>>>>> they tell
>>>>>>>> how far we got?
>>>>>>> IIUC the callers don't have to know whether it is half-way or not
>>>>>>> because the callers will change the permission back (e.g. to RW) 
>>>>>>> for the
>>>>>>> whole range when freeing memory.
>>>>>> Yes, it is the caller's responsibility to set 
>>>>>> VM_FLUSH_RESET_PERMS flag.
>>>>>> Upon vfree(), it will change the direct map permissions back to RW.
>>>>> Ok, but vfree() ends up using update_range_prot() to do that and 
>>>>> if we
>>>>> need to worry about that failing (as per your commit message), then
>>>>> we're in trouble because the calls to set_area_direct_map() are 
>>>>> unchecked.
>>>>>
>>>>> In other words, this patch is either not necessary or it is 
>>>>> incomplete.
>>>>
>>>> Here is the relevant email, in the discussion between Ryan and Yang:
>>>>
>>>> https://lore.kernel.org/all/fe52a1d8-5211-4962-afc8-c3f9caf64119@os.amperecomputing.com/ 
>>>>
>>>>
>>>> We had concluded that all callers of set_memory_ro() or 
>>>> set_memory_rox() (which require the
>>>> linear map perm change back to default, upon vfree() ) will call it 
>>>> for the entire region (vm_struct).
>>>> So, when we do the set_direct_map_invalid_noflush, it is guaranteed 
>>>> that the region has already
>>>> been split. So this call cannot fail.
>>>>
>>>> https://lore.kernel.org/all/f8898c87-8f49-4ef2-86ae-b60bcf67658c@os.amperecomputing.com/ 
>>>>
>>>>
>>>> This email notes that there is some code doing set_memory_rw() and 
>>>> unnecessarily setting the VM_FLUSH_RESET_PERMS
>>>> flag, but in that case we don't care about the 
>>>> set_direct_map_invalid_noflush call failing because the protections
>>>> are already RW.
>>>>
>>>> Although we had also observed that all of this is fragile and 
>>>> depends on the caller doing the
>>>> correct thing. The real solution should be somehow getting rid of 
>>>> the BBM style invalidation.
>>>> Ryan had proposed some methods in that email thread.
>>>>
>>>> One solution which I had thought of, is that, observe that we are 
>>>> doing an overkill by
>>>> setting the linear map to invalid and then default, for the 
>>>> *entire* region. What we
>>>> can do is iterate over the linear map alias of the vm_struct *area 
>>>> and only change permission
>>>> back to RW for the pages which are *not* RW. And, those relevant 
>>>> mappings are guaranteed to
>>>> be split because they were changed from RW to not RW.
>>>
>>> @Yang and Ryan,
>>>
>>> I saw Yang's patch here:
>>> https://lore.kernel.org/all/20251023204428.477531-1-yang@os.amperecomputing.com/ 
>>>
>>> and realized that currently we are splitting away the linear map 
>>> alias of the *entire* region.
>>>
>>> Shouldn't this then imply that set_direct_map_invalid_noflush will 
>>> never fail, since even
>>>
>>> a set_memory_rox() call on a single page will split the linear map 
>>> for the entire region,
>>>
>>> and thus there is no fragility here which we were discussing about? 
>>> I may be forgetting
>>>
>>> something, this linear map stuff is confusing enough already.
>>
>> It still may fail due to page table allocation failure when doing 
>> split. But it is still fine. We may run into 3 cases:
>>
>> 1. set_memory_rox succeed to split the whole range, then 
>> set_direct_map_invalid_noflush() will succeed too
>> 2. set_memory_rox fails to split, for example, just change partial 
>> range permission due to page table allocation failure, then 
>> set_direct_map_invalid_noflush() may
>>    a. successfully change the permission back to default till where 
>> set_memory_rox fails at since that range has been successfully split. 
>> It is ok since the remaining range is actually not changed to ro by 
>> set_memory_rox at all
>>    b. successfully change the permission back to default for the 
>> whole range (for example, memory pressure is mitigated when 
>> set_direct_map_invalid_noflush() is called). It is definitely fine as 
>> well
>
> Correct, what I mean to imply here is that, your patch will break 
> this? If set_memory_* is applied on x till y, your patch changes the 
> linear map alias
>
> only from x till y - set_direct_map_invalid_noflush instead operates 
> on 0 till size - 1, where 0 <=x <=y <= size - 1. So, it may encounter 
> a -ENOMEM
>
> on [0, x) range while invalidating, and that is *not* okay because we 
> must reset back [0, x) to default?

Okay I realize this is wrong, please read the last bit as "and this is 
okay because [0, x) is default already", and therefore

your patch is correct.


Let me send a patch documenting this so we don't get confused in the 
future, it is easy to forget all of this.


>
>
>>
>> Hopefully I don't miss anything.
>>
>> Thanks,
>> Yang
>>
>>
>>>
>>>
>>>>
>>>>>
>>>>> Will
>>>>
>>
>