lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170313143617.GR31518@dhcp22.suse.cz>
Date:   Mon, 13 Mar 2017 15:36:17 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Igor Mammedov <imammedo@...hat.com>
Cc:     Heiko Carstens <heiko.carstens@...ibm.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        David Rientjes <rientjes@...gle.com>,
        Daniel Kiper <daniel.kiper@...cle.com>,
        linux-api@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        linux-s390@...r.kernel.org, xen-devel@...ts.xenproject.org,
        linux-acpi@...r.kernel.org, qiuxishi@...wei.com,
        toshi.kani@....com, xieyisheng1@...wei.com, slaoub@...il.com,
        iamjoonsoo.kim@....com, vbabka@...e.cz,
        Zhang Zhen <zhenzhang.zhang@...wei.com>,
        Reza Arbab <arbab@...ux.vnet.ibm.com>,
        Yasuaki Ishimatsu <yasu.isimatu@...il.com>,
        Tang Chen <tangchen@...fujitsu.com>
Subject: Re: WTH is going on with memory hotplug sysf interface (was: Re:
 [RFC PATCH] mm, hotplug: get rid of auto_online_blocks)

On Mon 13-03-17 14:57:12, Igor Mammedov wrote:
> On Mon, 13 Mar 2017 11:43:02 +0100
> Michal Hocko <mhocko@...nel.org> wrote:
> 
> > On Mon 13-03-17 11:31:10, Igor Mammedov wrote:
> > > On Fri, 10 Mar 2017 14:58:07 +0100  
> > [...]
> > > > [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> > > > [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
> > > > [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
> > > > [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x27fffffff] hotplug
> > > > [    0.000000] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x3fffffff] -> [mem 0x00000000-0x3fffffff]
> > > > [    0.000000] NODE_DATA(0) allocated [mem 0x3fffc000-0x3fffffff]
> > > > [    0.000000] NODE_DATA(1) allocated [mem 0x7ffdc000-0x7ffdffff]
> > > > [    0.000000] Zone ranges:
> > > > [    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > > > [    0.000000]   DMA32    [mem 0x0000000001000000-0x000000007ffdffff]
> > > > [    0.000000]   Normal   empty
> > > > [    0.000000] Movable zone start for each node
> > > > [    0.000000] Early memory node ranges
> > > > [    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
> > > > [    0.000000]   node   0: [mem 0x0000000000100000-0x000000003fffffff]
> > > > [    0.000000]   node   1: [mem 0x0000000040000000-0x000000007ffdffff]
> > > > 
> > > > so there is neither any normal zone nor movable one at the boot time.  
> > > it could be if hotpluggable memory were present at boot time in E802 table
> > > (if I remember right when running on hyperv there is movable zone at boot time),
> > > 
> > > but in qemu hotpluggable memory isn't put into E820,
> > > so zone is allocated later when memory is enumerated
> > > by ACPI subsystem and onlined.
> > > It causes less issues wrt movable zone and works for
> > > different versions of linux/windows as well.
> > > 
> > > That's where in kernel auto-onlining could be also useful,
> > > since user would be able to start-up with with small
> > > non removable memory plus several removable DIMMs
> > > and have all the memory onlined/available by the time
> > > initrd is loaded. (missing piece here is onling
> > > removable memory as movable by default).  
> > 
> > Why we should even care to online that memory that early rather than
> > making it available via e820?
> 
> It's not forbidden by spec and has less complications
> when it comes to removable memory. Declaring it in E820
> would add following limitations/drawbacks:
>  - firmware should be able to exclude removable memory
>    from its usage (currently SeaBIOS nor EFI have to
>    know/care about it) => less qemu-guest ABI to maintain.
>  - OS should be taught to avoid/move (early) nonmovable
>    allocations from removable address ranges.
>    There were patches targeting that in recent kernels,
>    but it won't work with older kernels that don't have it.
>    So limiting a range of OSes that could run on QEMU
>    and do memory removal.
> 
> E820 less approach works reasonably well with wide range
> of guest OSes and less complex that if removable memory
> were present it E820. Hence I don't have a compelling
> reason to introduce removable memory in E820 as it
> only adds to hot(un)plug issues.

OK I see and that sounds like an argument to not put those ranges to
E820. I still fail to see why we haeve to online the memory early during
the boot and cannot wait for userspace to run?

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ