lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8775e0b-a206-3ec8-7499-a3c3cfd782e2@redhat.com>
Date:   Wed, 26 Jul 2023 09:50:19 +0200
From:   David Hildenbrand <david@...hat.com>
To:     mawupeng <mawupeng1@...wei.com>, anshuman.khandual@....com,
        will@...nel.org
Cc:     catalin.marinas@....com, akpm@...ux-foundation.org,
        sudaraja@...eaurora.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, wangkefeng.wang@...wei.com,
        linux-arm-kernel@...ts.infradead.org, mark.rutland@....com
Subject: Re: [RFC PATCH] arm64: mm: Fix kernel page tables incorrectly deleted
 during memory removal

On 26.07.23 08:20, mawupeng wrote:
> 
> 
> On 2023/7/24 14:11, David Hildenbrand wrote:
>> On 24.07.23 07:54, Anshuman Khandual wrote:
>>>
>>>
>>> On 7/24/23 06:55, mawupeng wrote:
>>>>
>>>> On 2023/7/21 18:36, Will Deacon wrote:
>>>>> On Mon, Jul 17, 2023 at 07:51:50PM +0800, Wupeng Ma wrote:
>>>>>> From: Ma Wupeng <mawupeng1@...wei.com>
>>>>>>
>>>>>> During our test, we found that kernel page table may be unexpectedly
>>>>>> cleared with rodata off. The root cause is that the kernel page is
>>>>>> initialized with pud size(1G block mapping) while offline is memory
>>>>>> block size(MIN_MEMORY_BLOCK_SIZE 128M), eg, if 2G memory is hot-added,
>>>>>> when offline a memory block, the call trace is shown below,
>>
>> Is someone adding memory in 2 GiB granularity and then removing parts of it in 128 MiB granularity? That would be against what we support using the add_memory() / offline_and_remove_memory() API and that driver should be fixed instead.
> 
> Yes, this kind of situation.
> 
> The problem occurs in the following scenarios:
> 1. use mem=xxG to reserve memory.
> 2. add_momory to online memory.
> 3. offline part of the memroy via offline_and_remove_memory.
> 
> During my research, ACPI memory removal use memory_subsys_offline to offline memory section and
> this will not delete page table entry which do not trigger this kind of problem.
> 
> So I understand what you are talking about.
> 1. 3rd-party driver shouldn't use add_memory/offline_and_remove_memory to online/offline memory.
>     If it have to use, this can be achieved by driver.
> 2. memory_subsys_offline is perfered to do such thing.

No, my point is that

1) If you use add_memory() and offline_and_remove_memory() in the *same
    granularity* it has to be working, otherwise it has to be fixed.

2) If you use add_memory() and offline_and_remove_memory() in different
    granularity (especially, add_memory() in bigger granularity) , then
    change your code to do add_memory() in the same granularity.


If you run into 1), then we populated a PUD for boot memory that also 
covers yet unpopulated physical memory ranges that are later populated 
by add_memory(). If that's the case, then we can either fix it by

a) Not doing that. Use PMD tables instead for that piece of memory.

b) Detecting that that PUD still covers memory and refusing to remove
    that PUD.

c) Rejecting to hotadd memory in this situation at that location. We
    have mhp_get_pluggable_range() -> arch_get_mappable_range() to kind-
    of handle something like that.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ