lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180911091608.GQ10951@dhcp22.suse.cz>
Date:   Tue, 11 Sep 2018 11:16:08 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Pasha Tatashin <Pavel.Tatashin@...rosoft.com>
Cc:     "zaslonko@...ux.ibm.com" <zaslonko@...ux.ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        "osalvador@...e.de" <osalvador@...e.de>,
        "gerald.schaefer@...ibm.com" <gerald.schaefer@...ibm.com>
Subject: Re: [PATCH] memory_hotplug: fix the panic when memory end is not on
 the section boundary

On Mon 10-09-18 15:26:55, Pavel Tatashin wrote:
> 
> 
> On 9/10/18 10:41 AM, Michal Hocko wrote:
> > On Mon 10-09-18 14:32:16, Pavel Tatashin wrote:
> >> On Mon, Sep 10, 2018 at 10:19 AM Michal Hocko <mhocko@...nel.org> wrote:
> >>>
> >>> On Mon 10-09-18 14:11:45, Pavel Tatashin wrote:
> >>>> Hi Michal,
> >>>>
> >>>> It is tricky, but probably can be done. Either change
> >>>> memmap_init_zone() or its caller to also cover the ends and starts of
> >>>> unaligned sections to initialize and reserve pages.
> >>>>
> >>>> The same thing would also need to be done in deferred_init_memmap() to
> >>>> cover the deferred init case.
> >>>
> >>> Well, I am not sure TBH. I have to think about that much more. Maybe it
> >>> would be much more simple to make sure that we will never add incomplete
> >>> memblocks and simply refuse them during the discovery. At least for now.
> >>
> >> On x86 memblocks can be upto 2G on machines with over 64G of RAM.
> > 
> > sorry I meant pageblock_nr_pages rather than memblocks.
> 
> OK. This sound reasonable, but, to be honest I am not sure how to
> achieve this yet, I need to think more about this. In theory, if we have
> sparse memory model, it makes sense to enforce memory alignment to
> section sizes, sounds a lot safer.

Memory hotplug is sparsemem only. You do not have to think about other
memory models fortunately.
 
> >> Also, memory size is way to easy too change via qemu arguments when VM
> >> starts. If we simply disable unaligned trailing memblocks, I am sure
> >> we would get tons of noise of missing memory.
> >>
> >> I think, adding check_hotplug_memory_range() would work to fix the
> >> immediate problem. But, we do need to figure out  a better solution.
> >>
> >> memblock design is based on archaic assumption that hotplug units are
> >> physical dimms. VMs and hypervisors changed all of that, and we can
> >> have much finer hotplug requests on machines with huge DIMMs. Yet, we
> >> do not want to pollute sysfs with millions of tiny memory devices. I
> >> am not sure what a long term proper solution for this problem should
> >> be, but I see that linux hotplug/hotremove subsystems must be
> >> redesigned based on the new requirements.
> > 
> > Not an easy task though. Anyway, sparse memory modely is highly based on
> > memory sections so it makes some sense to have hotplug section based as
> > well. Memblocks as a higher logical unit on top of that is kinda hack.
> > The userspace API has never been properly thought through I am afraid.
> 
> I agree memoryblock is a hack, it fails to do both things it was
> designed to do:
> 
> 1. On bare metal you cannot free a physical dimm of memory using
> memoryblock granularity because memory devices do not equal to physical
> dimms. Thus, if for some reason a particular dimm must be
> remove/replaced, memoryblock does not help us.

agreed

> 2. On machines with hypervisors it fails to provide an adequate
> granularity to add/remove memory.
> 
> We should define a new user interface where memory can be added/removed
> at a finer granularity: sparse section size, but without a memory
> devices for each section. We should also provide an optional access to
> legacy interface where memory devices are exported but each is of
> section size.
> 
> So, when legacy interface is enabled, current way would work:
> 
> echo offline > /sys/devices/system/memory/memoryXXX/state
> 
> And new interface would allow us to do something like this:
> 
> echo offline 256M > /sys/devices/system/node/nodeXXX/memory
> 
> With optional start address for offline memory.
> echo offline [start_pa] size > /sys/devices/system/node/nodeXXX/memory
> start_pa and size must be section size aligned (128M).

I am not sure what is the expected semantic of the version without
start_pa.

> It would probably be a good discussion for the next MM Summit how to
> solve the current memory hotplug interface limitations.

Yes, sounds good to me. In any case let's not pollute this email thread
with this discussion now.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ