[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cab45bd6-8108-4a6f-816a-3f7b70a2902f@os.amperecomputing.com>
Date: Wed, 25 Jun 2025 13:40:27 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, Mike Rapoport <rppt@...nel.org>,
Dev Jain <dev.jain@....com>
Cc: akpm@...ux-foundation.org, david@...hat.com, catalin.marinas@....com,
will@...nel.org, lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com,
vbabka@...e.cz, surenb@...gle.com, mhocko@...e.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, suzuki.poulose@....com, steven.price@....com,
gshan@...hat.com, linux-arm-kernel@...ts.infradead.org,
anshuman.khandual@....com
Subject: Re: [PATCH v3 1/2] arm64: pageattr: Use pagewalk API to change memory
permissions
On 6/25/25 4:04 AM, Ryan Roberts wrote:
> On 15/06/2025 08:32, Mike Rapoport wrote:
>> On Fri, Jun 13, 2025 at 07:13:51PM +0530, Dev Jain wrote:
>>> -/*
>>> - * This function assumes that the range is mapped with PAGE_SIZE pages.
>>> - */
>>> -static int __change_memory_common(unsigned long start, unsigned long size,
>>> +static int ___change_memory_common(unsigned long start, unsigned long size,
>>> pgprot_t set_mask, pgprot_t clear_mask)
>>> {
>>> struct page_change_data data;
>>> @@ -61,9 +140,28 @@ static int __change_memory_common(unsigned long start, unsigned long size,
>>> data.set_mask = set_mask;
>>> data.clear_mask = clear_mask;
>>>
>>> - ret = apply_to_page_range(&init_mm, start, size, change_page_range,
>>> - &data);
>>> + arch_enter_lazy_mmu_mode();
>>> +
>>> + /*
>>> + * The caller must ensure that the range we are operating on does not
>>> + * partially overlap a block mapping. Any such case should either not
>>> + * exist, or must be eliminated by splitting the mapping - which for
>>> + * kernel mappings can be done only on BBML2 systems.
>>> + *
>>> + */
>>> + ret = walk_kernel_page_table_range_lockless(start, start + size,
>>> + &pageattr_ops, NULL, &data);
>> x86 has a cpa_lock for set_memory/set_direct_map to ensure that there's on
>> concurrency in kernel page table updates. I think arm64 has to have such
>> lock as well.
> We don't have a lock today, using apply_to_page_range(); we are expecting that
> the caller has exclusive ownership of the portion of virtual memory - i.e. the
> vmalloc region or linear map. So I don't think this patch changes that requirement?
>
> Where it does get a bit more hairy is when we introduce the support for
> splitting. In that case, 2 non-overlapping areas of virtual memory may share a
> large leaf mapping that needs to be split. But I've been discussing that with
> Yang Shi at [1] and I think we can handle that locklessly too.
If the split is serialized by a lock, changing permission can be
lockless. But if split is lockless, changing permission may be a little
bit tricky, particularly for CONT mappings. The implementation in my
split patch assumes the whole range has cont bit cleared if the first
PTE in the range has cont bit cleared because the lock guarantees two
concurrent splits are serialized.
But lockless split may trigger the below race:
CPU A is splitting the page table, CPU B is changing the permission for
one PTE entry in the same table. Clearing cont bit is RMW, changing
permission is RMW too, but neither of them is atomic.
CPU A CPU B
read the PTE read the PTE
clear the cont bit for the PTE
change the PTE permission from RW to RO
store the new PTE
store the new PTE <- it will overwrite the PTE value stored by CPU B and
result in misprogrammed cont PTEs
We should need do one the of the follows to avoid the race off the top
of my head:
1. Serialize the split with a lock
2. Make page table RMW atomic in both split and permission change
3. Check whether PTE is cont or not for every PTEs in the range instead
of the first PTE, before clearing cont bit if they are
4. Retry if cont bit is not cleared in permission change, but we need
distinguish this from changing permission for the whole CONT PTE range
because we keep cont bit for this case
Thanks,
Yang
>
> Perhaps I'm misunderstanding something?
>
> [1] https://lore.kernel.org/all/f036acea-1bd1-48a7-8600-75ddd504b8db@arm.com/
>
> Thanks,
> Ryan
>
>>> + arch_leave_lazy_mmu_mode();
>>> +
>>> + return ret;
>>> +}
Powered by blists - more mailing lists