linux-kernel - Re: [PATCH] kexec_core: Accept unaccepted kexec destination addresses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <tpbcun3d4wrnbtsvx3b3hjpdl47f2zuxvx6zqsjoelazdt3eyv@kgqnedtcejta>
Date: Tue, 22 Oct 2024 15:06:15 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Yan Zhao <yan.y.zhao@...el.com>, kexec@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev, x86@...nel.org, 
	rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com
Subject: Re: [PATCH] kexec_core: Accept unaccepted kexec destination addresses

On Mon, Oct 21, 2024 at 09:33:17AM -0500, Eric W. Biederman wrote:
> Yan Zhao <yan.y.zhao@...el.com> writes:
> 
> > The kexec destination addresses (incluing those for purgatory, the new
> > kernel, boot params/cmdline, and initrd) are searched from the free area of
> > memblock or RAM resources. Since they are not allocated by the currently
> > running kernel, it is not guaranteed that they are accepted before
> > relocating the new kernel.
> >
> > Accept the destination addresses for the new kernel, as the new kernel may
> > not be able to or may not accept them by itself.
> >
> > Place the "accept" code immediately after the destination addresses pass
> > sanity checks, so the code can be shared by both users of the kexec_load
> > and kexec_file_load system calls.
> 
> I am not at all certain this is sufficient, and I am a bit flummoxed
> about the need to ever ``accept'' memory lazily.
> 
> In a past life I wrote bootup firmware, and as part of that was the code
> to initialize the contents of memory.  When properly tuned and setup it
> would never take more than a second to just blast initial values into
> memory.  That is because the ratio of memory per memory controller to
> memory bandwidth stayed roughly constant while I was paying attention.
> I expect that ratio to continue staying roughly constant or systems
> will quickly start developing unacceptable boot times.
> 
> As I recall Intel TDX is where the contents of memory are encrypted per
> virtual machine.  Which implies that you have the same challenge as
> bootup initializing memory, and that is what ``accepting'' memory is.
> 
> I am concerned that an unfiltered accept_memory may result in memory
> that has already been ``accepted'' being accepted again.

It is not unfiltered. We check it against bitmap that maintains the
accept status of the memory block.

> This has
> the potential to be wasteful in the best case, and the potential to
> cause memory that is in use to be reinitialized losing the values
> that are currently stored there.
> 
> I am concerned that the target kernel won't know about about accepting
> memory, or might not perform the work early enough and try to use memory
> without accepting it first.

The bitmap I mentioned above passed between two kernels via an EFI config
table. This mechanism predates kexec enabling of the systems with
unaccepted memory support, so there should not be a problem.

> I would much prefer if getting into kexec_load would force the memory
> acceptance out of lazy mode (or possibly not even work in lazy mode).
> That keeps things simple for now.

You can always force this behaviour with accept_memory=eager, but it is
waaay slower for larger VMs. It is especially bad idea if kexec used as
initial bootloader and most of the memory is not yet accepted by the time
kexec is triggered.

> Once enough people have machines requiring the use of accept_memory
> we can worry about optimizing things and pushing the accept_memory call
> down into kexec_load.

It is already here and it works. Despite some bugs that need to be
addressed.

> Ugh.  I just noticed another issue.  Unless the memory we are talking
> about is the memory reserved for kexec on panic kernels the memory needs
> struct pages and everything setup so it can be allocated from anyway.

I am not sure I follow. Could you please elaborate?

> Which is to say I think this is has the potential to conflict with
> the accounting in try_to_accept_memory.
> 
> Please just make memory acceptance ``eager'' non-lazy when using kexec.
> Unless someone has messed their implementation badly it won't be a
> significant amount of time in human terms, and it makes the code
> so much easier to understand and think about.

Waiting minutes to get VM booted to shell is not feasible for most
deployments. Lazy is sane default to me.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov