[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ob8aix0u.fsf@xmission.com>
Date: Mon, 02 Sep 2013 17:12:01 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@...achi.com>
Cc: Don Zickus <dzickus@...hat.com>, Ingo Molnar <mingo@...nel.org>,
linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Gleb Natapov <gleb@...hat.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Joerg Roedel <joro@...tes.org>, x86@...nel.org,
stable@...r.kernel.org, Marcelo Tosatti <mtosatti@...hat.com>,
Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>,
Sebastian Andrzej Siewior <sebastian@...akpoint.cc>,
Ingo Molnar <mingo@...hat.com>,
Zhang Yanfei <zhangyanfei@...fujitsu.com>,
yrl.pp-manager.tt@...achi.com,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Thomas Gleixner <tglx@...utronix.de>,
Seiji Aguchi <seiji.aguchi@....com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock
Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@...achi.com> writes:
> Hi Eric and Don,
>
> Sorry for the late reply.
>
> (2013/08/31 9:58), Eric W. Biederman wrote:
>> Don Zickus <dzickus@...hat.com> writes:
>>
>>> On Tue, Aug 27, 2013 at 12:41:51PM +0900, Yoshihiro YUNOMAE wrote:
>>>> Hi Don,
>>>>
>>>> Sorry for the late reply.
>>>>
>>>> (2013/08/22 22:11), Don Zickus wrote:
>>>>> On Thu, Aug 22, 2013 at 05:38:07PM +0900, Yoshihiro YUNOMAE wrote:
>>>>>>> So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep
>>>>>>> the code simpler.
>>>>>>
>>>>>> Thank you for commenting about my patch.
>>>>>> I didn't know you already have submitted the patches for this deadlock
>>>>>> problem.
>>>>>>
>>>>>> I can't answer definitively right now that no problems are induced by
>>>>>> removing disable_IO_APIC(). However, my patch should be work well (and
>>>>>> has already been merged to -tip tree). So how about taking my patch at
>>>>>> first, and then discussing the removal of disabled_IO_APIC()?
>>>>>
>>>>> It doesn't matter to me. My orignal patch last year was similar to yours
>>>>> until it was suggested that we were working around a problem which was we
>>>>> shouldn't touch the IO_APIC code on panic. Then I wrote the removal of
>>>>> disable_IO_APIC patch and did lots of testing on it. I don't think I have
>>>>> seen any issues with it (just the removal of disabling the lapic stuff).
>>>>
>>>> Yes, you really did a lot of testing about this problem according to
>>>> your patch(https://lkml.org/lkml/2012/1/31/391). Although you
>>>> said jiffies calibration code does not need the PIT in
>>>> http://lists.infradead.org/pipermail/kexec/2012-February/006017.html,
>>>> I don't understand yet why we can remove disable_IO_APIC.
>>>> Would you please explain about the calibration codes?
>>>
>>> I forgot a lot of this, Eric B. might remember more (as he was the one that
>>> pointed this out initially). I believe initially the io_apic had to be in
>>> a pre-configured state in order to do some early calibration of the timing
>>> code. Later on, it was my understanding, that the calibration of various
>>> time keeping stuff did not need the io_apic in a correct state. The code
>>> might have switched to tsc instead of PIT, I forget.
>>
>> Yes. Alan Coxe's initial SMP port had a few cases where it still
>> exepected the system to be in PIT mode during boot and it took us a
>> decade or so before those assumptions were finally expunged.
>
> Would you please tell me the commit ID or the hint like files,
> functions, or when?
The short version is last time we tilted at this windmill the only
problem we could find was nmi's caused by the nmi watchdog.
So as a bug work-around all we need to retain is disabling the nmi
watchdog in crash-kexec.
>>> Then again looking at the output of the latest dmesg, it seems the IO APIC
>>> is initialized way before the tsc is calibrated. So I am not sure what
>>> needed to get done or what interrupts are needed before the IO APIC gets
>>> initialized.
>>
>> The practical issue is that jiffies was calibrated off of the PIT timer
>> if I recall. But that is all old news.
>
> Are the jiffies calibration codes calibrate_delay()?
> It seems that the jiffies calibration have not used PIT in 2005
> according to 8a9e1b0.
Exactly. That was the original reason why we put in the code to
disable the IOAPIC and the local apic. There might have been other
reasons but that was the primary.
>>>> By the way, can we remove disable_IO_APIC even if an old dump capture
>>>> kernel is used?
>>>
>>> Good question. I did a bunch of testing with RHEL-6 too, which is 2.6.32
>>> based. But I think we added some IRR fixes (commit 1e75b31d638), which
>>> may or may not have helped in this case. So I don't know when a kernel
>>> started worked correctly during init (with the right changes). I believe
>>> 2.6.32 had everything.
>>
>> A sufficient old and buggy dump capture kernel will fail because of bugs
>> in it's startup path, but I don't think anyone cares.
>
> OK, if the jiffies calibration problem has been fixed in the old days,
> we don't need to care for the old kernel.
Exactly. There may have been one or two other silly assumptions and to
the best of our knowledge all of those have been purged except the
assumption that an NMI watchdog won't happen between kernels and while
booting the kernel.
>> The kernel startup path has been fixed for years, and disable_IO_APIC in
>> crash_kexec has always been a bug work-around for deficiencies in the
>> kernel's start up path (not part of the guaranteed interface).
>> Furthermore every real system configuration I have encountered used the
>> same kernel version for the crashdump kernel and the production kernel.
>> So we should be good.
>
> We also will be use the kdump(crashdump) kernel as the production
> kernel. Should I only care for the current kernel?
For this particular issue yes.
In general it is important for there to be a stable interface between
the two kernels just so you are not required to use the same kernel
version, and so there is the possibility of using something besides a
linux kernel.
At the same time it has always been the targets kernel's responsibility
to sort out the hardware devices unless it can't possibily do it. And
apics for the longest time were very very hard to reset in the target
kernel, but now that they are not. It makes sense for time permitting
to remove the now unnecessary code in the crashing kernel. Because
ultimately the less code we have the fewer possible ways we can fail
in a known broken kernel.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists