linux-kernel - Re: [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFgQCTtsw9xj3M85HU2GBk5iPSF4h_H43do-rfpXMo8svmgoJg@mail.gmail.com>
Date:   Tue, 15 Jan 2019 14:06:18 +0800
From:   Pingfan Liu <kernelfans@...il.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Len Brown <lenb@...nel.org>, Yinghai Lu <yinghai@...nel.org>,
        Tejun Heo <tj@...nel.org>, Chao Fan <fanc.fnst@...fujitsu.com>,
        Baoquan He <bhe@...hat.com>, Juergen Gross <jgross@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...e.com>, x86@...nel.org,
        linux-acpi@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by
 pushing forward the parsing of mem hotplug info

On Tue, Jan 15, 2019 at 7:02 AM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 1/10/19 9:12 PM, Pingfan Liu wrote:
> > Background
> > When kaslr kernel can be guaranteed to sit inside unmovable node
> > after [1].
>
> What does this "[1]" refer to?
>
https://lore.kernel.org/patchwork/patch/1029376/

> Also, can you clarify your terminology here a bit.  By "kaslr kernel",
> do you mean the base address?
>
It should be the randomization of load address. Googled, and found out
that it is "base address".

> > But if kaslr kernel is located near the end of the movable node,
> > then bottom-up allocator may create pagetable which crosses the boundary
> > between unmovable node and movable node.
>
> Again, I'm confused.  Do you literally mean a single page table page?  I
> think you mean the page tables, but it would be nice to clarify this,
> and also explicitly state which page tables these are.
>
It should be page table pages. The page table is built by init_mem_mapping().

> >  It is a probability issue,
> > two factors include -1. how big the gap between kernel end and
> > unmovable node's end.  -2. how many memory does the system own.
> > Alternative way to fix this issue is by increasing the gap by
> > boot/compressed/kaslr*.
>
> Oh, you mean the KASLR code in arch/x86/boot/compressed/kaslr*.[ch]?
>
Sorry, and yes, code in arch/x86/boot/compressed/kaslr_64.c and kaslr.c

> It took me a minute to figure out you were talking about filenames.
>
> > But taking the scenario of PB level memory, the pagetable will take
> > server MB even if using 1GB page, different page attr and fragment
> > will make things worse. So it is hard to decide how much should the
> > gap increase.
> I'm not following this.  If we move the image around, we leave holes.
> Why do we need page table pages allocated to cover these holes?
>
I means in arch/x86/boot/compressed/kaslr.c, store_slot_info() {
slot_area.num = (region->size - image_size) /CONFIG_PHYSICAL_ALIGN + 1
}.  Let us denote the size of page table as "X", then the formula is
changed to slot_area.num = (region->size - image_size -X)
/CONFIG_PHYSICAL_ALIGN + 1. And it is hard to decide X due to the
above factors.

> > The following figure show the defection of current bottom-up style:
> >   [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]
>
> "defection"?
>
Oh, defect.

> > If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
> > can generate pgtable on nodeC, which stain movable node.
>
> Let me see if I can summarize this:
> 1. The kernel ASLR decompression code picks a spot to place the kernel
>    image in physical memory.
> 2. Some page tables are dynamically allocated near (after) this spot.
> 3. Sometimes, based on the random ASLR location, these page tables fall
>    over into the "movable node" area.  Being unmovable allocations, this
>    is not cool.
> 4. To fix this (on 64-bit at least), we stop allocating page tables
>    based on the location of the kernel image.  Instead, we allocate
>    using the memblock allocator itself, which knows how to avoid the
>    movable node.
>
Yes, you get my idea exactly. Thanks for your help to summary it. Hard
for me to express it clearly in English.

> > This patch makes it certainty instead of a probablity problem. It achieves
> > this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping().
>
> What does memory hotplug have to do with this?  I thought this was all
> about early boot.

Put the info about memory hot plugable to memblock allocator,
initmem_init()->...->acpi_numa_memory_affinity_init(), where
memblock_mark_hotplug() does it. Later when memory allocator works, in
__next_mem_range(), it will check this info by
memblock_is_hotpluggable().

Thanks and regards,
Pingfan