linux-kernel - Re: [v1 0/2] "Hotremove" persistent memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4h73gUwntDYx012qcyMYCmzZDU3HOvKcW5DRkO-GoTc+w@mail.gmail.com>
Date:   Sat, 20 Apr 2019 09:34:26 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Pavel Tatashin <pasha.tatashin@...een.com>
Cc:     James Morris <jmorris@...ei.org>, Sasha Levin <sashal@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        linux-nvdimm <linux-nvdimm@...ts.01.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Keith Busch <keith.busch@...el.com>,
        Vishal L Verma <vishal.l.verma@...el.com>,
        Dave Jiang <dave.jiang@...el.com>,
        Ross Zwisler <zwisler@...nel.org>,
        Tom Lendacky <thomas.lendacky@....com>,
        "Huang, Ying" <ying.huang@...el.com>,
        Fengguang Wu <fengguang.wu@...el.com>,
        Borislav Petkov <bp@...e.de>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Yaowei Bai <baiyaowei@...s.chinamobile.com>,
        Takashi Iwai <tiwai@...e.de>,
        Jérôme Glisse <jglisse@...hat.com>
Subject: Re: [v1 0/2] "Hotremove" persistent memory

On Sat, Apr 20, 2019 at 8:32 AM Pavel Tatashin
<pasha.tatashin@...een.com> wrote:
>
> Recently, adding a persistent memory to be used like a regular RAM was
> added to Linux. This work extends this functionality to also allow hot
> removing persistent memory.
>
> We (Microsoft) have a very important use case for this functionality.
>
> The requirement is for physical machines with small amount of RAM (~8G)
> to be able to reboot in a very short period of time (<1s). Yet, there is
> a userland state that is expensive to recreate (~2G).
>
> The solution is to boot machines with 2G preserved for persistent
> memory.

Makes sense, but I have some questions about the details.

>
> Copy the state, and hotadd the persistent memory so machine still has all
> 8G for runtime. Before reboot, hotremove device-dax 2G, copy the memory
> that is needed to be preserved to pmem0 device, and reboot.
>
> The series of operations look like this:
>
>         1. After boot restore /dev/pmem0 to boot
>         2. Convert raw pmem0 to devdax
>         ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
>         3. Hotadd to System RAM
>         echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
>         echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
>         4. Before reboot hotremove device-dax memory from System RAM
>         echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
>         5. Create raw pmem0 device
>         ndctl create-namespace --mode raw  -e namespace0.0 -f
>         6. Copy the state to this device

What is the source of this copy? The state that was in the hot-added
memory? Isn't it "already there" since you effectively renamed dax0.0
to pmem0?

>         7. Do kexec reboot, or reboot through firmware, is firmware does not
>         zero memory in pmem region.

Wouldn't the dax0.0 contents be preserved regardless? How does the
guest recover the pre-initialized state / how does the kernel know to
give out the same pages to the application as the previous boot?