linux-kernel - Re: [PATCH V6 RESEND 0/3] arm64/mm: Enable memory hot remove

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190723105636.GA5004@lakrids.cambridge.arm.com>
Date:   Tue, 23 Jul 2019 11:56:36 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     Anshuman Khandual <anshuman.khandual@....com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, akpm@...ux-foundation.org,
        catalin.marinas@....com, will.deacon@....com, mhocko@...e.com,
        ira.weiny@...el.com, david@...hat.com, cai@....pw,
        logang@...tatee.com, james.morse@....com, cpandya@...eaurora.org,
        arunks@...eaurora.org, dan.j.williams@...el.com,
        mgorman@...hsingularity.net, osalvador@...e.de,
        ard.biesheuvel@....com, steve.capper@....com
Subject: Re: [PATCH V6 RESEND 0/3] arm64/mm: Enable memory hot remove

Hi Anshuman,

On Mon, Jul 15, 2019 at 11:47:47AM +0530, Anshuman Khandual wrote:
> This series enables memory hot remove on arm64 after fixing a memblock
> removal ordering problem in generic try_remove_memory() and a possible
> arm64 platform specific kernel page table race condition. This series
> is based on linux-next (next-20190712).
> 
> Concurrent vmalloc() and hot-remove conflict:
> 
> As pointed out earlier on the v5 thread [2] there can be potential conflict
> between concurrent vmalloc() and memory hot-remove operation. This can be
> solved or at least avoided with some possible methods. The problem here is
> caused by inadequate locking in vmalloc() which protects installation of a
> page table page but not the walk or the leaf entry modification.
> 
> Option 1: Making locking in vmalloc() adequate
> 
> Current locking scheme protects installation of page table pages but not the
> page table walk or leaf entry creation which can conflict with hot-remove.
> This scheme is sufficient for now as vmalloc() works on mutually exclusive
> ranges which can proceed concurrently only if their shared page table pages
> can be created while inside the lock. It achieves performance improvement
> which will be compromised if entire vmalloc() operation (even if with some
> optimization) has to be completed under a lock.
> 
> Option 2: Making sure hot-remove does not happen during vmalloc()
> 
> Take mem_hotplug_lock in read mode through [get|put]_online_mems() constructs
> for the entire duration of vmalloc(). It protects from concurrent memory hot
> remove operation and does not add any significant overhead to other concurrent
> vmalloc() threads. It solves the problem in right way unless we do not want to
> extend the usage of mem_hotplug_lock in generic MM.
> 
> Option 3: Memory hot-remove does not free (conflicting) page table pages
> 
> Don't not free page table pages (if any) for vmemmap mappings after unmapping
> it's virtual range. The only downside here is that some page table pages might
> remain empty and unused until next memory hot-add operation of the same memory
> range.
> 
> Option 4: Dont let vmalloc and vmemmap share intermediate page table pages
> 
> The conflict does not arise if vmalloc and vmemap range do not share kernel
> page table pages to start with. If such placement can be ensured in platform
> kernel virtual address layout, this problem can be successfully avoided.
> 
> There are two generic solutions (Option 1 and 2) and two platform specific
> solutions (Options 2 and 3). This series has decided to go with (Option 3)
> which requires minimum changes while self-contained inside the functionality.

... while also leaking memory, right?

In my view, option 2 or 4 would have been preferable. Were there
specific technical reasons to not go down either of those routes? I'm
not sure that minimizing changes is the right rout given that this same
problem presumably applies to other architectures, which will need to be
fixed.

Do we know why we aren't seeing issues on other architectures? e.g. is
the issue possible but rare (and hence not reported), or masked by
something else (e.g. the layout of the kernel VA space)?

I'd like to solve the underyling issue before we start adding new
functionality.

> Testing:
> 
> Memory hot remove has been tested on arm64 for 4K, 16K, 64K page config
> options with all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS
> combinations. Its only build tested on non-arm64 platforms.

Could you please share how you've tested this?

Having instructions so that I could reproduce this locally would be very
helpful.

Thanks,
Mark.