lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fb86e753-95c0-41bd-b8f6-ebc810cd8a94@arm.com>
Date: Thu, 19 Jun 2025 13:25:27 +0530
From: Anshuman Khandual <anshuman.khandual@....com>
To: Will Deacon <will@...nel.org>
Cc: linux-arm-kernel@...ts.infradead.org, stable@...r.kernel.org,
 Catalin Marinas <catalin.marinas@....com>,
 Ryan Roberts <ryan.roberts@....com>, linux-kernel@...r.kernel.org,
 Dev Jain <dev.jain@....com>
Subject: Re: [PATCH] arm64/ptdump: Ensure memory hotplug is prevented during
 ptdump_check_wx()



On 18/06/25 5:06 PM, Will Deacon wrote:
> On Fri, Jun 13, 2025 at 10:39:02AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 12/06/25 8:28 PM, Will Deacon wrote:
>>> On Mon, Jun 09, 2025 at 05:12:14AM +0100, Anshuman Khandual wrote:
>>>> The arm64 page table dump code can race with concurrent modification of the
>>>> kernel page tables. When a leaf entries are modified concurrently, the dump
>>>> code may log stale or inconsistent information for a VA range, but this is
>>>> otherwise not harmful.
>>>>
>>>> When intermediate levels of table are freed, the dump code will continue to
>>>> use memory which has been freed and potentially reallocated for another
>>>> purpose. In such cases, the dump code may dereference bogus addresses,
>>>> leading to a number of potential problems.
>>>>
>>>> This problem was fixed for ptdump_show() earlier via commit 'bf2b59f60ee1
>>>> ("arm64/mm: Hold memory hotplug lock while walking for kernel page table
>>>> dump")' but a same was missed for ptdump_check_wx() which faced the race
>>>> condition as well. Let's just take the memory hotplug lock while executing
>>>> ptdump_check_wx().
>>>
>>> How do other architectures (e.g. x86) handle this? I don't see any usage
>>> of {get,put}_online_mems() over there. Should this be moved into the core
>>> code?
>>
>> Memory hot remove on arm64 unmaps kernel linear and vmemmap mapping while
>> also freeing page table pages if those become empty. Although this might
>> not be true for all other architectures, which might just unmap affected
>> kernel regions but does not tear down the kernel page table.
> 
> ... that sounds like something we should be able to give a definitive
> answer to?

Agreed.

arch_remove_memory() is the primary arch callback which does the unmapping
and also tearing down of the required kernel page table regions i.e linear
and vmemmap mapping . These are the call paths that reach platform specific
memory removal via arch_remove_memory().

A) ZONE_DEVICE

devm_memremap_pages()
    devm_memremap_pages_release()
        devm_memunmap_pages()
            memunmap_pages()
                arch_remove_memory()

B) Normal DRAM

echo 1 > /sys/devices/system/memory/memoryX/offline

memory_subsys_offline()
    device_offline()
        memory_offline()
            offline_memory_block()
                remove_memory()
                    __remove_memory()
                        arch_remove_memory()

Currently there are six platforms which enable ARCH_ENABLE_MEMORY_HOTREMOVE
thus implementing arch_remove_memory(). Core memory hot removal process does
not have any set expectations from these callbacks. So platforms are free to
implement unmap and page table tearing down operation as deemed necessary.

ARCH_ENABLE_MEMORY_HOTREMOVE - arm64, loongarch, powerpc, riscv, s390, x86
ARCH_HAS_PTDUMP              - arm64, powerpc, riscv, s390, x86

In summary all the platforms that support memory hot remove and ptdump do
try and free the unmapped regions of the page table when possible. Hence
they are indeed exposed to possible race with ptdump walk.

But as mentioned earlier the callback arch_remove_memory() does not have to
tear down the page tables. Unless there are objections from other platforms,
standard memory hotplug lock could indeed be taken during all generic ptdump
walk paths. 

arm64
=====
    arch_remove_memory()
        __remove_pages()
            sparse_remove_section()
                section_deactivate()
                    depopulate_section_memmap()
                    free_map_bootmem()
                        vmemmap_free()              /* vmemap mapping */
                            unmap_hotplug_range()   /* Unmap */
                            free_empty_tables()     /* Tear down */

        __remove_pgd_mapping()
            __remove_pgd_mapping()                  /* linear Mapping */
                unmap_hotplug_range()               /* Unmap */
                free_empty_tables()                 /* Tear down */

powerpc
=======
    arch_remove_memory()
        __remove_pages()
            sparse_remove_section()
                    section_deactivate()
                        depopulate_section_memmap()
                            vmemmap_free()
                                __vmemmap_free()            /* Hash */
                                radix__vmemmap_free()       /* Radix */

        arch_remove_linear_mapping()
            remove_section_mapping()
                hash__remove_section_mapping()              /* Hash */
                radix__remove_section_mapping()             /* Radix */
    
riscv
=====
    arch_remove_memory()
        __remove_pages()
            sparse_remove_section()
                    section_deactivate()
                        depopulate_section_memmap()
                            vmemmap_free()
                                remove_pgd_mapping()

        remove_linear_mapping()
            remove_pgd_mapping()
    
remove_pgd_mapping() recursively calls remove_pxd_mapping() and
free_pxd_table() when applicable.

s390
=====
    arch_remove_memory()
        __remove_pages()
            sparse_remove_section()
                section_deactivate()
                    depopulate_section_memmap()
                        vmemmap_free()
                            remove_pagetable()
                                modify_pagetable()
        
        vmem_remove_mapping()
            vmem_remove_range()
                remove_pagetable() 
                        modify_pagetable()

modify_pagetable() on s390 does try to tear down the page table
when possible.
 
x86
===
    arch_remove_memory()
        __remove_pages()
            sparse_remove_section()
                section_deactivate()
                    depopulate_section_memmap()
                    free_map_bootmem()
                        vmemmap_free()              /* vmemap mapping */
                            remove_pagetable()

        kernel_physical_mapping_remove()            /* linear Mapping */
            remove_pagetable()

remove_pagetable() on x86 calls remove_pxd_table() followed up call
with free_pxd_table() which does tear down the page table as well
and hence exposed to race with PTDUMP which scans over the entire
kernel page table.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ