linux-kernel - Re: [PATCH V3] panic: Move panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220129080027.GC17613@MiWiFi-R3L-srv>
Date:   Sat, 29 Jan 2022 16:00:27 +0800
From:   Baoquan He <bhe@...hat.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        "Guilherme G. Piccoli" <gpiccoli@...lia.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "kernel@...ccoli.net" <kernel@...ccoli.net>,
        "senozhatsky@...omium.org" <senozhatsky@...omium.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "john.ogness@...utronix.de" <john.ogness@...utronix.de>,
        "feng.tang@...el.com" <feng.tang@...el.com>,
        "kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
        "dyoung@...hat.com" <dyoung@...hat.com>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "anton@...msg.org" <anton@...msg.org>,
        "ccross@...roid.com" <ccross@...roid.com>,
        "tony.luck@...el.com" <tony.luck@...el.com>
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/26/22 at 12:51pm, Petr Mladek wrote:
> On Mon 2022-01-24 16:57:17, Michael Kelley (LINUX) wrote:
> > From: Baoquan He <bhe@...hat.com> Sent: Friday, January 21, 2022 8:34 PM
> > > 
> > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > From: Baoquan He <bhe@...hat.com> Sent: Thursday, January 20, 2022 6:31 PM
> > > > >
> > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > Hi Baoquan, some comments inline below:
> > > > > >
> > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> > 
> > [snip]
> > 
> > > > > > Do you think it should be necessary?
> > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > might have a much clear code.
> > > > >
> > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > leave it to user to decide.
> > > > >
> > > >
> > > > Regarding Hyper-V:   Invoking the Hyper-V notifier prior to running the
> > > > kdump kernel is necessary for correctness.  During initial boot of the
> > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > may write to.   A VMbus connection is also established. Before kexec'ing
> > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > and the VMbus connection must be terminated.   If this isn't done, the
> > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > physical memory pages get used for something else.
> > 
> > In the Azure cloud, collecting data before crash dumps is a motivation
> > as well for setting crash_kexec_post_notifiers to true.   That way as
> > cloud operator we can see broad failure trends, and in specific cases
> > customers often expect the cloud operator to be able to provide info
> > about a problem even if they have taken a kdump.  Where did you
> > envision adding a comment in the code to help clarify these intentions?
> > 
> > I looked at the code again, and should revise my previous comments
> > somewhat.   The Hyper-V resets that I described indeed must be done
> > prior to kexec'ing the kdump kernel.   Most such resets are actually
> > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > panic notifier. However, the Hyper-V panic notifier must terminate the
> > VMbus connection, because that must be done even if kdump is not
> > being invoked.  See commit 74347a99e73.
> >
> > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure 
> > were probably due to the machine_crash_shutdown() path, and not due
> > to running the panic notifiers prior to kexec'ing the kdump kernel.  The
> > exception is terminating the VMbus connection, which had problems that
> > are hopefully now fixed because of adding a timeout.
> 
> My undestanding is that we could split the actions into three groups:
> 
>   1. Actions that has to be before kexec'ing kdump kernel, like
>      resetting physicall memory shared with Hyper-V.
> 
>      These operation(s) are needed only for kexec and can be done
>      in kexec.
> 
> 
>    2. Notify Hyper-V so that, for example, Azure cloud, could collect
>       data before crash dump.
> 
>       It is nice to have.
> 
>       It should be configurable if it is not completely safe. I mean
>       that there should be a way to disable it when it might increase
>       the risk that kexec'ing kdump kernel might fail.
> 
> 
>    3. Some actions are needed only when panic() ends up in the
>       infinite loop.
> 
>       For example, unloading vmbus channel. At least the commit
>       74347a99e73ae00b8385f ("x86/Hyper-V: Unload vmbus channel in
>       hv panic callback") says that it is done in kdump path
>       out of box.
> 
> All these operations are needed and used only when the kernel is
> running under Hyper-V.
> 
> My mine intention is to understand if we need 2 or 3 notifier lists
> or the current one is enough.
> 
> The 3 notifier lists would be:
> 
>    + always do (even before kdump)
>    + optionally do before or after kdump
>    + needed only when kdump is not called

Totally agree with above suggesitons for Hyper-V. Cleanup as ablove
seems necesary. Stuffing them into panic_notifiers package is not
appropriate.