lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180726151025.GN6480@MiWiFi-R3L-srv>
Date:   Thu, 26 Jul 2018 23:10:25 +0800
From:   Baoquan He <bhe@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, robh+dt@...nel.org,
        dan.j.williams@...el.com, nicolas.pitre@...aro.org,
        josh@...htriplett.org, fengguang.wu@...el.com, bp@...e.de,
        andy.shevchenko@...il.com, patrik.r.jakobsson@...il.com,
        airlied@...ux.ie, kys@...rosoft.com, haiyangz@...rosoft.com,
        sthemmin@...rosoft.com, dmitry.torokhov@...il.com,
        frowand.list@...il.com, keith.busch@...el.com,
        jonathan.derrick@...el.com, lorenzo.pieralisi@....com,
        bhelgaas@...gle.com, tglx@...utronix.de, brijesh.singh@....com,
        jglisse@...hat.com, thomas.lendacky@....com,
        gregkh@...uxfoundation.org, baiyaowei@...s.chinamobile.com,
        richard.weiyang@...il.com, devel@...uxdriverproject.org,
        linux-input@...r.kernel.org, linux-nvdimm@...ts.01.org,
        devicetree@...r.kernel.org, linux-pci@...r.kernel.org,
        ebiederm@...ssion.com, vgoyal@...hat.com, dyoung@...hat.com,
        yinghai@...nel.org, monstr@...str.eu, davem@...emloft.net,
        chris@...kel.net, jcmvbkbc@...il.com, gustavo@...ovan.org,
        maarten.lankhorst@...ux.intel.com, seanpaul@...omium.org,
        linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        kexec@...ts.infradead.org
Subject: Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if
 required

On 07/26/18 at 04:01pm, Michal Hocko wrote:
> On Thu 26-07-18 21:37:05, Baoquan He wrote:
> > On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > > > > have this worry. 
> > > > > > > > 
> > > > > > > > I do not have the full context here but let me note that you should be
> > > > > > > > careful when doing top-down reservation because you can easily get into
> > > > > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > > > > this is done. See memblock_find_in_range_node
> > > > > > > 
> > > > > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > > > > for them to do the later copying. You can see below struct kexec_segment, 
> > > > > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > > > > the @buf stores the user space buffer address, @mem stores the position
> > > > > > > where kernel/initrd will be put. In kernel, it calls
> > > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > > > > @mem we searched in the whole system RAM might be lost along with
> > > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > > > > detected.
> > > > > > 
> > > > > > I am not sure I am following. If @mem is placed at movable node then the
> > > > > > memory hotremove simply won't work, because we are seeing reserved pages
> > > > > > and do not know what to do about them. They are not migrateable.
> > > > > > Allocating intermediate pages from other nodes doesn't really help.
> > > > > 
> > > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > > > > is at top of system RAM and the top RAM is in movable node.
> > > > 
> > > > It will affect the 1st kernel (which does the memblock allocation
> > > > top-down) as well. For reasons mentioned above.
> > > 
> > > And btw. in the ideal world, we would restrict the memblock allocation
> > > top-down from the non-movable nodes. But I do not think we have that
> > > information ready at the time when the reservation is done.
> > 
> > Oh, you could mix kexec loading up with kdump kernel loading. For kdump
> > kernel, we need reserve memory region during bootup with memblock
> > allocator. For kexec loading, we just operate after system up, and do
> > not need to reserve any memmory region. About memory used to load them,
> > it's quite different way.
> 
> I didn't know about that. I thought both use the same underlying
> reservation mechanism. My bad and sorry for the noise.

Not at all. It's truly confusing. I often need take time to recall those
details. 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ