[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4B94CEFC.40405@redhat.com>
Date: Mon, 08 Mar 2010 12:18:36 +0200
From: Avi Kivity <avi@...hat.com>
To: Kerstin Jonsson <kerstin.jonsson@...csson.com>
CC: Thomas Renninger <trenn@...e.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jbohac@...ell.com" <jbohac@...ell.com>,
Yinghai Lu <yinghai@...nel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"mingo@...e.hu" <mingo@...e.hu>
Subject: Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec
On 02/26/2010 09:47 PM, Kerstin Jonsson wrote:
>>
>>
>>> From: Kerstin Jonsson<kerstin.jonsson@...csson.com>
>>>
>>> When the SMP kernel decides to crash_kexec() the local APICs may have
>>> pending interrupts in their vector tables.
>>> The setup routine for the local APIC has a deficient mechanism for
>>> clearing these interrupts, it only handles interrupts that has already
>>> been dispatched to the local core for servicing (the ISR register)
>>> safely, it doesn't consider lower prioritized queued interrupts stored
>>> in the IRR register.
>>>
>>> If you have more than one pending interrupt within the same 32 bit word
>>> in the LAPIC vector table registers you may find yourself entering the
>>> IO APIC setup with pending interrupts left in the LAPIC. This is a
>>> situation for wich the IO APIC setup is not prepared. Depending of
>>> what/which interrupt vector/vectors are stuck in the APIC tables your
>>> system may show various degrees of malfunctioning.
>>> That was the reason why the check_timer() failed in our system, the
>>> timer interrupts was blocked by pending interrupts from the old kernel
>>> when routed trough the IO APIC.
>>>
>>> Additional comment from Jiri Bohac:
>>> ==============
>>> If this should go into stable release,
>>> I'd add some kind of limit on the number of iterations, just to be safe from
>>> hard to debug lock-ups:
>>>
>>> +if (loops++> MAX_LOOPS) {
>>> + printk("LAPIC pending clean-up")
>>> + break;
>>> +}
>>> while (queued);
>>>
>>> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
>>> pending IRQs to be cleared and would and still cause at most a second of delay
>>> if the loop were to lock-up for whatever reason.
>>> ==============
>>>
>>> From trenn@...e.de:
>>> Merged Jiri suggestion into the patch.
>>> Also made the max_loops depend on cpu_khz. Not sure how long an apic_read
>>> takes, as it is on the CPU it may only be one cycle and we now wait 1 sec
>>> in WARN_ON(..) case?
>>>
>>>
>>>
>>>
>> An apic_read() can take a couple of microseconds when running
>> virtualized, so this loop may run for hours. On the other hand,
>> virtualized hardware is unlikely to misbehave.
>>
>> Still I recommend using a clocksource (tsc would do) and not a loop count.
>>
>> --
>> error compiling committee.c: too many arguments to function
>>
>>
>>
>>
> Is it possible/thinkable to distinguish between real and virtual targets?
> I.e. to somehow detect that the target is a virtual machine and adapt accordingly.
> There may be other cases as well, in which one would benefit from taking
> target type into consideration when e.g. estimating the reasonable number of cycles
> for a specific operation
It's possible (cpuid hypervisor bit), but I don't think it's a good
idea. Splitting up code paths doubles the chance of bugs. Much better
to find something that works both ways.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists