linux-kernel - Re: [PATCH V3 4/4] arm64/mm: Enable memory hot remove

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190516105741.GC40960@lakrids.cambridge.arm.com>
Date:   Thu, 16 May 2019 11:57:41 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     Anshuman Khandual <anshuman.khandual@....com>
Cc:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        akpm@...ux-foundation.org, catalin.marinas@....com,
        will.deacon@....com, mhocko@...e.com, mgorman@...hsingularity.net,
        james.morse@....com, robin.murphy@....com, cpandya@...eaurora.org,
        arunks@...eaurora.org, dan.j.williams@...el.com, osalvador@...e.de,
        david@...hat.com, cai@....pw, logang@...tatee.com,
        ira.weiny@...el.com
Subject: Re: [PATCH V3 4/4] arm64/mm: Enable memory hot remove

On Thu, May 16, 2019 at 11:04:48AM +0530, Anshuman Khandual wrote:
> On 05/15/2019 05:19 PM, Mark Rutland wrote:
> > On Tue, May 14, 2019 at 02:30:07PM +0530, Anshuman Khandual wrote:
> >> Memory removal from an arch perspective involves tearing down two different
> >> kernel based mappings i.e vmemmap and linear while releasing related page
> >> table and any mapped pages allocated for given physical memory range to be
> >> removed.
> >>
> >> Define a common kernel page table tear down helper remove_pagetable() which
> >> can be used to unmap given kernel virtual address range. In effect it can
> >> tear down both vmemap or kernel linear mappings. This new helper is called
> >> from both vmemamp_free() and ___remove_pgd_mapping() during memory removal.
> >>
> >> For linear mapping there are no actual allocated pages which are mapped to
> >> create the translation. Any pfn on a given entry is derived from physical
> >> address (__va(PA) --> PA) whose linear translation is to be created. They
> >> need not be freed as they were never allocated in the first place. But for
> >> vmemmap which is a real virtual mapping (like vmalloc) physical pages are
> >> allocated either from buddy or memblock which get mapped in the kernel page
> >> table. These allocated and mapped pages need to be freed during translation
> >> tear down. But page table pages need to be freed in both these cases.
> > 
> > As previously discussed, we should only hot-remove memory which was
> > hot-added, so we shouldn't encounter memory allocated from memblock.
> 
> Right, not applicable any more. Will drop this word.
> 
> >> These mappings need to be differentiated while deciding if a mapped page at
> >> any level i.e [pte|pmd|pud]_page() should be freed or not. Callers for the
> >> mapping tear down process should pass on 'sparse_vmap' variable identifying
> >> kernel vmemmap mappings.
> > 
> > I think that you can simplify the paragraphs above down to:
> > 
> >   The arch code for hot-remove must tear down portions of the linear map
> >   and vmemmap corresponding to memory being removed. In both cases the
> >   page tables mapping these regions must be freed, and when sparse
> >   vmemmap is in use the memory backing the vmemmap must also be freed.
> > 
> >   This patch adds a new remove_pagetable() helper which can be used to
> >   tear down either region, and calls it from vmemmap_free() and
> >   ___remove_pgd_mapping(). The sparse_vmap argument determines whether
> >   the backing memory will be freed.
> 
> The current one is bit more descriptive on detail. Anyways will replace with
> the above writeup if that is preferred.

I would prefer the suggested form above, as it's easier to extract the
necessary details from it.

[...]

> >> +static void
> >> +remove_pagetable(unsigned long start, unsigned long end, bool sparse_vmap)
> >> +{
> >> +	unsigned long addr, next;
> >> +	pud_t *pudp_base;
> >> +	pgd_t *pgdp;
> >> +
> >> +	spin_lock(&init_mm.page_table_lock);
> > 
> > It would be good to explain why we need to take the ptl here.
> 
> Will update both commit message and add an in-code comment here.
> 
> > 
> > IIUC that shouldn't be necessary for the linear map. Am I mistaken?
> 
> Its not absolutely necessary for linear map right now because both memory hot
> plug & ptdump which modifies or walks the page table ranges respectively take
> memory hotplug lock. That apart, no other callers creates or destroys linear
> mapping at runtime.
> 
> > 
> > Is there a specific race when tearing down the vmemmap?
> 
> This is trickier than linear map. vmemmap additions would be protected with
> memory hotplug lock but this can potential collide with vmalloc/IO regions.
> Even if they dont right now that will be because they dont share intermediate
> page table levels.

Sure; if we could just state something like:

  The vmemmap region may share levels of table with the vmalloc region.
  Take the ptl so that we can safely free potentially-sahred tables.

... I think that would be sufficient.

Thanks,
Mark.