[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33d9b711-288c-e5f4-0cc8-365c01861da5@bytedance.com>
Date: Mon, 6 Dec 2021 16:51:09 +0000
From: Usama Arif <usama.arif@...edance.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>,
Vivek Goyal <vgoyal@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>, x86@...nel.org,
linux-kernel@...r.kernel.org, tglx@...utronix.de,
fam.zheng@...edance.com, Baoquan He <bhe@...hat.com>,
Dave Young <dyoung@...hat.com>
Subject: Re: [External] Re: [PATCH] x86/purgatory: provide config to disable
purgatory
Hi,
I have sent a v3 of the patch with a much clearer commit message, please
let me know if there are any comments for the v3 patch or my responses
below.
Thanks,
Usama
On 01/12/2021 10:29, Usama Arif wrote:
> Hi,
>
> (Resending the reply as my email client had updated and inserted html
> code and caused a bounceback from the mailing list, sorry about that.)
>
> Thanks for your reply, I have responded with further comments/questions
> inline below, and also have provided some context for the patch at the
> start:
>
> The patch is not introducing a new CONFIG option, as can be seen in the
> v2 patch diff. It is converting an existing CONFIG option that only
> enabled purgatory for specific architectures in which it has been
> implemented (x86, powerpc and s390) to an option that can that allow
> purgatory to be disabled (with default enabled) and only provides code
> to disable purgatory for x86 only. From what i see, purgatory is
> currently not yet implemented in other architectures like arm64
> kexec_file case and riscv, so this would on a high level,
> provide only the option to make kexec on x86 similar to other the other
> architectures.
>
> As a background to the discussion, the usecase we are aiming is to
> update the host kernel with kexec in servers which is managing multiple
> VMs, to cut down the downtime of servers as much as possible so that its
> not noticeable to VMs. The patch is aimed at x86 purgatory code
> specifically. We are targetting other optimizations as well in other
> areas in boot path to cut down the 600ms time much further, so that cut
> down of 200ms downtime is significant in the usecase described. I did
> have a few comments/questions about your reply, please see my responses
> below:
>
>
> On 29/11/2021 16:53, Eric W. Biederman wrote:
>> Usama Arif <usama.arif@...edance.com> writes:
>>
>>> Hi,
>>>
>>> Thanks for your replies. I have submitted a v2 of the patch with a
>>> much more detailed commit message including reason for the patch and
>>> timing values.
>>>
>>> The time taken from reboot to running init process was measured
>>> with both purgatory enabled and disabled over 20 runs and the
>>> averages are:
>>> Purgatory disabled:
>>> - TSC = 3908766161 cycles
>>> - ktime = 606.8 ms
>>> Purgatory enabled:
>>> - TSC = 5005811885 cycles (28.1% worse)
>>> - ktime = 843.1 ms (38.9% worse)
>>>
>>>
>>> Our reason for this patch is that it helps reduce the downtime of
>>> servers when
>>> the host kernel managing multiple VMs needs to be updated via kexec,
>>> but it makes reboot with kexec much faster so should be a general
>>> improvement in
>>> boot time if purgatory is not needed and could have other usecases as
>>> well.
>>> I believe only x86, powerpc and s390 have purgatory supported, other
>>> platforms
>>> like arm64 dont have it implemented yet, so with the reboot time
>>> improvement seen,
>>> it would be a good idea to have the option to disable purgatory
>>> completely but set default to y.
>>> We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can be enabled
>>> to verify the next
>>> kernel image to be booted and purgatory can be completely skipped if
>>> not required.
>>
>> CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
>> different. It's job is to verify that the kernel to be booted comes
>> from a trusted source. The sha256 verification in purgatory's job
>> is to verify that memory the kernel cares about was not corrupted
>> during the kexec process.
>>
>
> Thanks, acknowledged.
>
>> I believe when you say purgatory you are really talking about that
>> sha256 checksum. It really is not possible to disable all of
>> the code that runs between kernels, as the old and the new kernel may
>> run at the same addresses. Anything that runs between the two kernels
>> is what is referred to as purgatory. Even if it is just a small
>> assembly stub.
>>
> >
>
> With this patch, i am trying to give an option (with default purgatory
> enabled)to disable purgatory completely on x86 only, i.e. no code
> running between 2 kernels on x86. From my understanding there is no
> purgatory in arm64 kexec_file case, in riscv and in some other archs as
> well, so I am not sure why its not possible to disable purgatory (all
> code running between 2 kernels) in x86? Unless i misunderstood something
> about the working of other platforms? In x86 case, from what i see
> relocate_kernel is still part of the older kernel and not purgatory, and
> with my patch, purgatory.ro is not built and kexec does execute
> successfully with purgatory disabled.
>
> About your point for the old and the new kernel may run at the same
> addresses,i dont think that can happen as in bzImage64_load function its
> loaded using the kbuf struct. This doesnt happen in other architectures
> (for e.g. arm64) that dont implement purgatory and any of the tests I
> conducted with the patch applied due to the use of kbuf struct.
>
> From what i see in the code, purgatory in x86 specifically, has 2 main
> functions, sha256 verification and loading the %rsi register for
> bootparam_load_addr. In this patch purgatory was disabled, which
> resulted in sha256 verification disabled and bootparam_load_addr moved
> to relocate_kernel. Our analysis revealed that most of the time is spent
> in the sha256 verification, so I would be ok to reformat the patch
> to make the sha verification optional and keep purgatory enabled if
> thats preferred? Although in my opinion the current patch of only
> providing an option to disable purgatory seems much better.
>
>
>> That sha256 verification is always needed for kexec on panic, there are
>> by the nature of a kernel panic too many unknowns to have any confidence
>> the new kernel will not be corrupted in the process of kexec before it
>> gets started.
>>
>> For an ordinary kexec it might be possible to say that you have a
>> reliable kernel shutdown process and you know for a fact that something
>> won't come along and corrupt the kernel. I find that a questionable
>> assertion. I haven't seen anyone yet whose focus when getting an
>> ordinary kexec to work as anything other than making certain all of the
>> drivers are shutdown properly.
>>
>> I have seen countless times when a network packet comes in a the wrong
>> time and the target kernel's memory is corrupted before it gets far
>> enough to initialize the network driver. >
>
> I agree that when doing testing and research, there are things that can
> go wrong during kexec process, but i assume that the expectation is that
> in production environment, when for e.g. updating a kernel with kexec in
> servers,the expectation is that kexec will execute successfully as long
> as there is nothing wrong in the old kernel and the new kernel.
>
> Maybe I didn't understand this correctly, but if a network packet comes
> in and corrupts the kernel memory, wont it also cause a problem even
> when purgatory is enabled? I agree that issue like these would be caught
> earlier with purgatory, but then if something like this breaks the kexec
> process in a production environment where the old and new kernel are
> well tested, trusted and expected to work, wouldn't there be a much more
> fundamental issue with the reliability of the current kexec process.
>
>> For a 0.2s speed up you are talking about disabling all of the safety
>> checks in a very dangerous situation. How much can you can in
>> performance by optimizing the sha256 implementation instead of using
>> what is essentially a reference implementation in basic C that I copied
>> from somewhere long ago.
>>
>> Optimize the sha256 implementation and the memory copy loop and then
>> show how the tiny bit of time that is left is on a mission critical path
>> and must be removed. Then we can reasonably talk about a config option
>> for disabling the sha256 implementation in the kexec in not-panic case.
>>
> Thanks for this, I assume that over here you are suggesting for e.g. in
> x86 to replace the existing sha verification implementation in purgatory
> with the one in arch/x86/crypto/sha256_ssse3_glue.c? Also could you
> point to the memory copy loop to optimize? I can have a look at these 2
> options.
>
> Even with these optimizations, i think the v2 patch should still be
> considered. The patch in the end is providing only an option to disable
> purgatory on x86 only, with the default value of keeping it enabled and
> not changing kexec behaviour. The CONFIG already existed, it is just
> renamed and now provides user the option to select to disable purgatory
> for x86, rather than it being architecture dependent.
>
> Thanks again for the review and comments,
> Usama
>
>
>> That sha256 implementation in part so that we can all sleep at night
>> because we don't have to deal with very very strange heizenbugs.
>>
>> Eric
>>
Powered by blists - more mailing lists