linux-kernel - Re: [PATCH 24/30] panic: Refactor the panic path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ded31ec0-076b-2c5b-0fe6-0c274954821f@igalia.com>
Date:   Fri, 20 May 2022 08:23:33 -0300
From:   "Guilherme G. Piccoli" <gpiccoli@...lia.com>
To:     Baoquan He <bhe@...hat.com>, Petr Mladek <pmladek@...e.com>
Cc:     "michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        Dave Young <dyoung@...hat.com>, d.hatayama@...fujitsu.com,
        akpm@...ux-foundation.org, kexec@...ts.infradead.org,
        linux-kernel@...r.kernel.org,
        bcm-kernel-feedback-list@...adcom.com,
        linuxppc-dev@...ts.ozlabs.org, linux-alpha@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, linux-edac@...r.kernel.org,
        linux-hyperv@...r.kernel.org, linux-leds@...r.kernel.org,
        linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
        linux-pm@...r.kernel.org, linux-remoteproc@...r.kernel.org,
        linux-s390@...r.kernel.org, linux-tegra@...r.kernel.org,
        linux-um@...ts.infradead.org, linux-xtensa@...ux-xtensa.org,
        netdev@...r.kernel.org, openipmi-developer@...ts.sourceforge.net,
        rcu@...r.kernel.org, sparclinux@...r.kernel.org,
        xen-devel@...ts.xenproject.org, x86@...nel.org,
        kernel-dev@...lia.com, kernel@...ccoli.net, halves@...onical.com,
        fabiomirmar@...il.com, alejandro.j.jimenez@...cle.com,
        andriy.shevchenko@...ux.intel.com, arnd@...db.de, bp@...en8.de,
        corbet@....net, dave.hansen@...ux.intel.com, feng.tang@...el.com,
        gregkh@...uxfoundation.org, hidehiro.kawai.ez@...achi.com,
        jgross@...e.com, john.ogness@...utronix.de, keescook@...omium.org,
        luto@...nel.org, mhiramat@...nel.org, mingo@...hat.com,
        paulmck@...nel.org, peterz@...radead.org, rostedt@...dmis.org,
        senozhatsky@...omium.org, stern@...land.harvard.edu,
        tglx@...utronix.de, vgoyal@...hat.com, vkuznets@...hat.com,
        will@...nel.org
Subject: Re: [PATCH 24/30] panic: Refactor the panic path

On 19/05/2022 20:45, Baoquan He wrote:
> [...]
>> I really appreciate the summary skill you have, to convert complex
>> problems in very clear and concise ideas. Thanks for that, very useful!
>> I agree with what was summarized above.
> 
> I want to say the similar words to Petr's reviewing comment when I went
> through the patches and traced each reviewing sub-thread to try to
> catch up. Petr has reivewed this series so carefully and given many
> comments I want to ack immediately.
> 
> I agree with most of the suggestions from Petr to this patch, except of
> one tiny concern, please see below inline comment.

Hi Baoquan, thanks! I'm glad you're also reviewing that =)


> [...]
> 
> I like the proposed skeleton of panic() and code style suggested by
> Petr very much. About panic_prefer_crash_dump which might need be added,
> I hope it has a default value true. This makes crash_dump execute at
> first by default just as before, unless people specify
> panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> this is inconsistent with the old behaviour.

I'd like to understand better why the crash_kexec() must always be the
first thing in your use case. If we keep that behavior, we'll see all
sorts of workarounds - see the last patches of this series, Hyper-V and
PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
execution of their relevant notifiers (like the vmbus disconnect,
specially in arm64 that has no custom machine_crash_shutdown, or the
fadump case in ppc). This led to more risk in kdump.

The thing is: with the notifiers' split, we tried to keep only the most
relevant/necessary stuff in this first list, things that ultimately
should improve kdump reliability or if not, at least not break it. My
feeling is that, with this series, we should change the idea/concept
that kdump must run first nevertheless, not matter what. We're here
trying to accommodate the antagonistic goals of hypervisors that need
some clean-up (even for kdump to work) VS. kdump users, that wish a
"pristine" system reboot ASAP after the crash.

Cheers,


Guilherme