[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h9b3s0vr.fsf@x220.int.ebiederm.org>
Date: Tue, 02 Aug 2016 09:26:16 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: "Wei\, Jiangang" <weijg.fnst@...fujitsu.com>
Cc: "kexec\@lists.infradead.org" <kexec@...ts.infradead.org>,
"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
"Cao\, Jin" <caoj.fnst@...fujitsu.com>,
"tglx\@linutronix.de" <tglx@...utronix.de>,
"bhe\@redhat.com" <bhe@...hat.com>,
"xpang\@redhat.com" <xpang@...hat.com>,
"kernel\@kyup.com" <kernel@...p.com>,
"x86\@kernel.org" <x86@...nel.org>,
"hpa\@zytor.com" <hpa@...or.com>,
"mingo\@redhat.com" <mingo@...hat.com>,
"joro\@8bytes.org" <joro@...tes.org>
Subject: Re: [PATCH v2 3/3] x86/apic: Improved the setting of interrupt mode for bsp
"Wei, Jiangang" <weijg.fnst@...fujitsu.com> writes:
> Hi Eric,
>
> Thanks for your response.
> But I have some different ideas...
Apologies for not replying to this earlier your reply got lost in my
spam folder and I overlooked it.
> On Mon, 2016-07-25 at 22:53 -0500, Eric W. Biederman wrote:
>> Wei Jiangang <weijg.fnst@...fujitsu.com> writes:
>>
>> > If we specify the 'notsc' parameter for the dump-capture kernel,
>> > and then trigger a crash(panic) by using "ALT-SysRq-c" or
>> > "echo c > /proc/sysrq-trigger", the dump-capture kernel will
>> > hang in calibrate_delay_converge() and wait for jiffies changes.
>> > serial log as follows:
>> >
>> > tsc: Fast TSC calibration using PIT
>> > tsc: Detected 2099.947 MHz processor
>> > Calibrating delay loop...
>> >
>> > The reason for jiffies not changes is there's no timer interrupt
>> > passed to dump-capture kernel.
>> >
>> > In fact, once kernel panic occurs, the local APIC is disabled
>> > by lapic_shutdown() in reboot path.
>> > generly speaking, local APIC state can be initialized by BIOS
>> > after Power-Up or Reset, which doesn't apply to kdump case.
>> > so the kernel has to be responsible for initialize the interrupt
>> > mode properly according the latest status of APIC in bootup path.
>> >
>> > An MP operating system is booted under either PIC mode or
>> > virtual wire mode. Later, the operating system switches to
>> > symmetric I/O mode as it enters multiprocessor mode.
>> > Two kinds of virtual wire mode are defined in Intel MP spec:
>> > virtual wire mode via local APIC or via I/O APIC.
>> >
>> > Now we determine the mode of APIC only through a SMP BIOS(MP table).
>> > That's not enough. It's better to do further check if APIC works
>> > with effective interrupt mode, and then, do some proper setting.
>>
>> Reading through the code let me pause a moment and say:
>> "Yowzers the interrupt initialization code has gotten hard to follow. It
>> is now full of indirection with ill defined semantics." pre_vector_init
>> indeed.
>>
>> I will argue this is the wrong fix.
>>
>> We really should not have to worry about getting the system functional
>> in virtual wire mode on a modern system. And looking at the code
>> someone has done half the work and made it conditional under
>> acpi_gbl_reduced_hardware.
>>
>> Now reduced hardware implies a bit more than we ware talking about but
>> if there is ACPI apic information we should not need to worry about
>> external interrupts and can just enable the apics.
>>
>> In fact I think having MPtable information is enough for that.
>
> In my opinion, The MP table is not to be trusted if system starts
> without BIOS phrase.
>
> The chapter 3.8 System Initial State (MP v1.4 spec) said below:
> "1. All local APICs are disabled, except for the local APIC of the BSP
> if the system starts in Virtual Wire Mode."
>
> And
> .....
> "The BIOS must disable interrupts to all processors and set the APICs to
> the system initial state before giving control to the operating system.
> The operating system is responsible for initializing all of the APICs"
>
> The dump-capture kernel won't through the BIOS phrase,
> so there is no guarantee of the interrupt mode of APIC and the status of
> local APIC.
>
> Although the MP table is read-only,
> the dump-capture kernel uses the MP table which be generated before the
> first kernel boots up is unreasonable.
> At least, only checking MP table is not enough.
> That's the starting point of my patch.
>
> You said “this is the wrong fix”,
> Is there any wrong with the codes or my solution to check the status of
> APIC in bootup path?
Today in a dump capture kernel it uses the MP table or the ACPI MP
information to set up the apics.
How I read your patch is you do something clever to get in virtual wire
mode. AKA a historical PC/AT compatible mode. I think we should just
skip that mode entirely.
>> So I think what needs to happens is for the apic initialization to get
>> an overhaul that makes apic initialization the happy path and the other
>> irq controllers the odd backwards compatibility path. And when we
>> are done we never run in anything except full apic mode unless the
>> hardware doesn't support it.
> The spec requests "An MP operating system is booted under either one of
> the two PC/AT-compatible modes. Later the operating system switches to
> Symmetric I/O Mode as it enters multiprocessor mode."
> And it seems that the latest kernel follows the rule.
> Does the apic initialization really need to be changed?
For simplicity when MP support was first added, a few parts of early
bootup were left handled in PC/AT compatible mode. Mostly it is just
silly.
Unless there is a good reason I think we should strive to remove those
few moments when the system is using interrupts in PC/AT mode and always
run interrupts in the final mode we intend to run interrupts in.
In fact the code under acpi_gbl_reduced_hardware does that for some
special cases already today.
At the end of the day that should end up with a simpler more reliable
kernel. Especially for the dump capture kernel.
AKA:
I am proposing make the code say something like:
if (pc_at_compatible_pic_mode) {
do_legacy_pic_mode();
} else {
do_mp_mode();
}
If my memory serves virtual wire mode does not need to be enabled at all
in mp mode, so if we are coming from mp mode (with no shutdown of the
apics in the first kernel). Then there will be no information anywhere
about which apics need to be programmed into apic mode.
As such to simplify the kdump process it is the wrong process to go back
into virtual wire PC/AT compatible pic mode. Because once we remove the
code from the crashing kernel we won't have enough information to make
it work. Unfortunately in some cases it is a one way transition.
Eric
Powered by blists - more mailing lists