[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k0um8m53.fsf@nanos.tec.linutronix.de>
Date: Sun, 15 Nov 2020 20:18:00 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Lukas Wunner <lukas@...ner.de>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Bjorn Helgaas <helgaas@...nel.org>, linux-pci@...r.kernel.org,
kernelfans@...il.com, andi@...stfloor.org, hpa@...or.com,
bhe@...hat.com, x86@...nel.org, okaya@...nel.org, mingo@...hat.com,
jay.vosburgh@...onical.com, dyoung@...hat.com,
gavin.guo@...onical.com,
"Guilherme G. Piccoli" <gpiccoli@...onical.com>, bp@...en8.de,
bhelgaas@...gle.com, shan.gavin@...ux.alibaba.com,
"Rafael J. Wysocki" <rjw@...ysocki.net>, kernel@...ccoli.net,
kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
ddstreet@...onical.com, vgoyal@...hat.com
Subject: Re: [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks
On Sun, Nov 15 2020 at 18:01, Lukas Wunner wrote:
> On Sun, Nov 15, 2020 at 04:11:43PM +0100, Thomas Gleixner wrote:
>> Unfortunately there is no way to tell the APIC "Mask vector X" and the
>> dump kernel does neither know which device it comes from nor does it
>> have enumerated PCI completely which would reset the device and shutup
>> the spew. Due to the interrupt storm it does not get that far.
>
> Can't we just set DisINTx, clear MSI Enable and clear MSI-X Enable
> on all active PCI devices in the crashing kernel before starting the
> crash kernel? So that the crash kernel starts with a clean slate?
>
> Guilherme's original patches from 2018 iterate over all 256 PCI buses.
> That might impact boot time negatively. The reason he has to do that
> is because the crashing kernel doesn't know which devices exist and
> which have interrupts enabled. However the crashing kernel has that
> information. It should either disable interrupts itself or pass the
> necessary information to the crashing kernel as setup_data or whatever.
As I explained before: The problem with doing anything between crashing
and starting the crash kernel is reducing the chance of actually
starting it. The kernel crashed for whatever reason, so it's in a
completely undefined state.
Ergo there is no 'just do something'. You really have to think hard
about what can be done safely with the least probability of running into
another problem.
Thanks,
tglx
Powered by blists - more mailing lists