lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 Jan 2019 18:18:04 +0800
From:   Baoquan He <bhe@...hat.com>
To:     Mike Rapoport <rppt@...ux.ibm.com>
Cc:     Pingfan Liu <kernelfans@...il.com>, linux-acpi@...r.kernel.org,
        linux-mm@...ck.org, kexec@...ts.infradead.org,
        Tang Chen <tangchen@...fujitsu.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Len Brown <lenb@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Michal Hocko <mhocko@...e.com>,
        Jonathan Corbet <corbet@....net>,
        Yaowei Bai <baiyaowei@...s.chinamobile.com>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Nicholas Piggin <npiggin@...il.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Daniel Vacek <neelx@...hat.com>,
        Mathieu Malaterre <malat@...ian.org>,
        Stefan Agner <stefan@...er.ch>, Dave Young <dyoung@...hat.com>,
        yinghai@...nel.org, vgoyal@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv3 1/2] mm/memblock: extend the limit inferior of
 bottom-up after parsing hotplug attr

On 01/02/19 at 11:27am, Mike Rapoport wrote:
> On Wed, Jan 02, 2019 at 02:47:34PM +0800, Pingfan Liu wrote:
> > On Mon, Dec 31, 2018 at 4:40 PM Mike Rapoport <rppt@...ux.ibm.com> wrote:
> > >
> > > On Fri, Dec 28, 2018 at 11:00:01AM +0800, Pingfan Liu wrote:
> > > > The bottom-up allocation style is introduced to cope with movable_node,
> > > > where the limit inferior of allocation starts from kernel's end, due to
> > > > lack of knowledge of memory hotplug info at this early time. But if later,
> > > > hotplug info has been got, the limit inferior can be extend to 0.
> > > > 'kexec -c' prefers to reuse this style to alloc mem at lower address,
> > > > since if the reserved region is beyond 4G, then it requires extra mem
> > > > (default is 16M) for swiotlb.
> > >
> > > I fail to understand why the availability of memory hotplug information
> > > would allow to extend the lower limit of bottom-up memblock allocations
> > > below the kernel. The memory in the physical range [0, kernel_start) can be
> > > allocated as soon as the kernel memory is reserved.
> > >
> > Yes, the  [0, kernel_start) can be allocated at this time by some func
> > e.g. memblock_reserve(). But there is trick. For the func like
> > memblock_find_in_range(), this is hotplug attr checking ,,it will
> > check the hotmovable attr in __next_mem_range()
> > {
> > if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
> > continue
> > }.  So the movable memory can be safely skipped.
> 
> I still don't see the connection between allocating memory below
> kernel_start and the hotplug info.
> 
> The check for 'end > kernel_end' in
> 
>  	if (memblock_bottom_up() && end > kernel_end)
> 
> does not protect against allocation in a hotplugable area.
> If memblock_find_in_range() is called before hotplug info is parsed it can
> return a range in a hotplugable area.
> 
> The point I'd like to clarify is why allocating memory in the range [0,
> kernel_start) cannot be done before hotplug info is available and why it is
> safe to allocate that memory afterwards?

Well, I think that's because we have KASLR. Before KASLR was introdueced,
kernel is put at a low and fixed physical address. Allocating memblock
bottom-up after kernel can make sure those kernel data is in the same node
as kernel text itself before SRAT parsed. While [0, kernel_start) is a
very small range, e.g on x86, it was 16 MB, which is very possibly used
up.

But now, with KASLR enabled by default, this bottom-up after kernel text
allocation has potential issue. E.g we have node0 (including normal zone),
node1(including movable zone), if KASLR put kernel at top of node0, the
next memblock allocation before SRAT parsed will stamp into movable zone
of node1, hotplug doesn't work well any more consequently. I had
considered this issue previously, but haven't thought of a way to fix
it.

While it's not related to this patch. About this patchset, I didn't
check it carefully in v2 post, and acked it. In fact the current way is
not good, Pingfan should call __memblock_find_range_bottom_up() directly
for crashkernel reserving. Reasons are:
1)SRAT parsing is done, system restore to take top-down way to do
memblock allocat.
2)we do need to find range bottom-up if user specify crashkernel=xxM
(without a explicit base address).

Thanks
Baoquan

> 
> > Thanks for your kindly review.
> > 
> > Regards,
> > Pingfan
> > 
> > > The extents of the memory node hosting the kernel image can be used to
> > > limit memblok allocations from that particular node, even in top-down mode.
> > >
> > > > Signed-off-by: Pingfan Liu <kernelfans@...il.com>
> > > > Cc: Tang Chen <tangchen@...fujitsu.com>
> > > > Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>
> > > > Cc: Len Brown <lenb@...nel.org>
> > > > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > > > Cc: Mike Rapoport <rppt@...ux.vnet.ibm.com>
> > > > Cc: Michal Hocko <mhocko@...e.com>
> > > > Cc: Jonathan Corbet <corbet@....net>
> > > > Cc: Yaowei Bai <baiyaowei@...s.chinamobile.com>
> > > > Cc: Pavel Tatashin <pasha.tatashin@...cle.com>
> > > > Cc: Nicholas Piggin <npiggin@...il.com>
> > > > Cc: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
> > > > Cc: Daniel Vacek <neelx@...hat.com>
> > > > Cc: Mathieu Malaterre <malat@...ian.org>
> > > > Cc: Stefan Agner <stefan@...er.ch>
> > > > Cc: Dave Young <dyoung@...hat.com>
> > > > Cc: Baoquan He <bhe@...hat.com>
> > > > Cc: yinghai@...nel.org,
> > > > Cc: vgoyal@...hat.com
> > > > Cc: linux-kernel@...r.kernel.org
> > > > ---
> > > >  drivers/acpi/numa.c      |  4 ++++
> > > >  include/linux/memblock.h |  1 +
> > > >  mm/memblock.c            | 58 +++++++++++++++++++++++++++++-------------------
> > > >  3 files changed, 40 insertions(+), 23 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> > > > index 2746994..3eea4e4 100644
> > > > --- a/drivers/acpi/numa.c
> > > > +++ b/drivers/acpi/numa.c
> > > > @@ -462,6 +462,10 @@ int __init acpi_numa_init(void)
> > > >
> > > >               cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
> > > >                                           acpi_parse_memory_affinity, 0);
> > > > +
> > > > +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> > > > +             mark_mem_hotplug_parsed();
> > > > +#endif
> > > >       }
> > > >
> > > >       /* SLIT: System Locality Information Table */
> > > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > > > index aee299a..d89ed9e 100644
> > > > --- a/include/linux/memblock.h
> > > > +++ b/include/linux/memblock.h
> > > > @@ -125,6 +125,7 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
> > > >  void memblock_trim_memory(phys_addr_t align);
> > > >  bool memblock_overlaps_region(struct memblock_type *type,
> > > >                             phys_addr_t base, phys_addr_t size);
> > > > +void mark_mem_hotplug_parsed(void);
> > > >  int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
> > > >  int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
> > > >  int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
> > > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > > index 81ae63c..a3f5e46 100644
> > > > --- a/mm/memblock.c
> > > > +++ b/mm/memblock.c
> > > > @@ -231,6 +231,12 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
> > > >       return 0;
> > > >  }
> > > >
> > > > +static bool mem_hotmovable_parsed __initdata_memblock;
> > > > +void __init_memblock mark_mem_hotplug_parsed(void)
> > > > +{
> > > > +     mem_hotmovable_parsed = true;
> > > > +}
> > > > +
> > > >  /**
> > > >   * memblock_find_in_range_node - find free area in given range and node
> > > >   * @size: size of free area to find
> > > > @@ -259,7 +265,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
> > > >                                       phys_addr_t end, int nid,
> > > >                                       enum memblock_flags flags)
> > > >  {
> > > > -     phys_addr_t kernel_end, ret;
> > > > +     phys_addr_t kernel_end, ret = 0;
> > > >
> > > >       /* pump up @end */
> > > >       if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
> > > > @@ -270,34 +276,40 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
> > > >       end = max(start, end);
> > > >       kernel_end = __pa_symbol(_end);
> > > >
> > > > -     /*
> > > > -      * try bottom-up allocation only when bottom-up mode
> > > > -      * is set and @end is above the kernel image.
> > > > -      */
> > > > -     if (memblock_bottom_up() && end > kernel_end) {
> > > > -             phys_addr_t bottom_up_start;
> > > > +     if (memblock_bottom_up()) {
> > > > +             phys_addr_t bottom_up_start = start;
> > > >
> > > > -             /* make sure we will allocate above the kernel */
> > > > -             bottom_up_start = max(start, kernel_end);
> > > > -
> > > > -             /* ok, try bottom-up allocation first */
> > > > -             ret = __memblock_find_range_bottom_up(bottom_up_start, end,
> > > > -                                                   size, align, nid, flags);
> > > > -             if (ret)
> > > > +             if (mem_hotmovable_parsed) {
> > > > +                     ret = __memblock_find_range_bottom_up(
> > > > +                             bottom_up_start, end, size, align, nid,
> > > > +                             flags);
> > > >                       return ret;
> > > >
> > > >               /*
> > > > -              * we always limit bottom-up allocation above the kernel,
> > > > -              * but top-down allocation doesn't have the limit, so
> > > > -              * retrying top-down allocation may succeed when bottom-up
> > > > -              * allocation failed.
> > > > -              *
> > > > -              * bottom-up allocation is expected to be fail very rarely,
> > > > -              * so we use WARN_ONCE() here to see the stack trace if
> > > > -              * fail happens.
> > > > +              * if mem hotplug info is not parsed yet, try bottom-up
> > > > +              * allocation with @end above the kernel image.
> > > >                */
> > > > -             WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
> > > > +             } else if (!mem_hotmovable_parsed && end > kernel_end) {
> > > > +                     /* make sure we will allocate above the kernel */
> > > > +                     bottom_up_start = max(start, kernel_end);
> > > > +                     ret = __memblock_find_range_bottom_up(
> > > > +                             bottom_up_start, end, size, align, nid,
> > > > +                             flags);
> > > > +                     if (ret)
> > > > +                             return ret;
> > > > +                     /*
> > > > +                      * we always limit bottom-up allocation above the
> > > > +                      * kernel, but top-down allocation doesn't have
> > > > +                      * the limit, so retrying top-down allocation may
> > > > +                      * succeed when bottom-up allocation failed.
> > > > +                      *
> > > > +                      * bottom-up allocation is expected to be fail
> > > > +                      * very rarely, so we use WARN_ONCE() here to see
> > > > +                      * the stack trace if fail happens.
> > > > +                      */
> > > > +                     WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
> > > >                         "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
> > > > +             }
> > > >       }
> > > >
> > > >       return __memblock_find_range_top_down(start, end, size, align, nid,
> > > > --
> > > > 2.7.4
> > > >
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
> > >
> > 
> 
> -- 
> Sincerely yours,
> Mike.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ