[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <62686ee8-4d67-753d-3442-5456a2ba7076@cn.fujitsu.com>
Date: Wed, 14 Feb 2018 11:22:14 +0800
From: Dou Liyang <douly.fnst@...fujitsu.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Baoquan He <bhe@...hat.com>, <linux-kernel@...r.kernel.org>,
<mingo@...nel.org>, <tglx@...utronix.de>, <x86@...nel.org>,
<joro@...tes.org>, <uobergfe@...hat.com>, <prarit@...hat.com>
Subject: Re: [PATCH v3 2/5] x86/apic: Fix restoring boot irq mode in reboot
and kexec/kdump
Hi Eric,
At 02/14/2018 01:40 AM, Eric W. Biederman wrote:
> Dou Liyang <douly.fnst@...fujitsu.com> writes:
>
>> Hi Baoquan,
>>
>> At 02/12/2018 11:08 AM, Eric W. Biederman wrote:
>>> Baoquan He <bhe@...hat.com> writes:
>>>
>>>> This is a regression fix.
>>>>
>>>> Before, to fix erratum AVR31, commit 522e66464467 ("x86/apic: Disable
>>>> I/O APIC before shutdown of the local APIC") moved lapic_shutdown()
>>>> calling after disable_IO_APIC(). This introdued a regression. The
>>>> root cause is that disable_IO_APIC() not only clears IO_APIC, also
>>>> restore boot irq mode by setting LAPIC/APIC/IMCR, lapic_shutdown()
>>>> after disable_IO_APIC() will disable LAPIC and ruin the possible
>>>> virtual wire mode setting which the code has been trying to do all
>>>> along.
>>>>
>>>> The consequence is, in KVM guest kernel always prints warning as below
>>>> during kexec/kdump kernel boots up. That happened in setup_local_APIC()
>>>> since 'do { xxx } while (queued && max_loops > 0)' loop does not function
>>
>> I am not sure another thing here
>>
>> AFAIK, according to the order of SMP machine shutdown, other CPUs will
>> be stopped firstly, then the last CPU disable its local apic.
>>
>> --machine_shutdown
>> |----......
>> |----stop_other_cpus()
>> |----local_shutdown()
>>
>> So, the pending interrupts exist only in BSP and only be ACKed by
>> BSP. is it right?(I validated this by print the value of APIC_IRR/ISR of
>> all CPUs, found only BSP had the non-zero value).
>
> We don't know. In the kexec on panic case we try and shutdown the other
> cpus but we have a timeout because we might fail.
>
> Further you have to be careful with the concept of boot cpu. In the
> normal kexec case shutdown on the boot cpu and leave it running. In the
> kexec on panic case we shutdown on an arbitrary cpu.
>
>> If it is right, We will do not need check the pending interrupt for each
>> cpus.
>
> It is also cheap if there are no pending interrupts as there is nothing
> to do.
>
Yes, It's cheap.
But, the local APICs in APs were disabled at that time, not sure if
writing to the end-of-interrupt (EOI) register could cause the local
APIC to clear the ISR successfully.
If the ISR was not cleared, doing that check is useless.
I can't produce any cases that the lapics in APs have pending
interrupts. do you have some suggestions?
>> BTW, the pending interrupt check code is mixed with the local
>> APIC setup code, that looks messy. How about extracting the code which
>> related to the crash interrupt check, put it into a new function and
>> only invoked it when the CPU is BSP?
>
> Moving it into it's own function makes sense. Let's not taint the
> concept with ``crash''. We don't know that the only way this will
> ever happen is from kexec on panic. We only know it was easy to
> trigger the condition from kexec on panic.
>
Yes, I see.
> There are a lot of cases I can think of that interrupts might fire while
> interrupts are disabled because the kernel is booting. A normal kexec
> is also possible.
>
> Throw in MSI interrupts and transitioning from one state to another in
> non-legacy apic mode and we might be more likely to get some irqs in
> a pending state.
>
> Your patch makes me nervous as it is not just code motion but a much
> more substantial change.
>
>
> As much as I agree that we need to fix the regression in the apic
> shutdown code that is causing problems. What we really need to do
> is to completely remove the apic shutdown code from the kexec on panic
> code path. Over the long term that will provide us with a much more
Wow, indeed, and it is related to many hypervisors on x86, For me, still
need more investigations and tests.
Thanks,
dou
Powered by blists - more mailing lists