[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140318194951.17fd61ea@thinkpad>
Date: Tue, 18 Mar 2014 19:49:51 +0100
From: Igor Mammedov <imammedo@...hat.com>
To: Prarit Bhargava <prarit@...hat.com>
Cc: linux-kernel@...r.kernel.org, tglx@...utronix.de, mingo@...hat.com,
hpa@...or.com, bp@...e.de, paul.gortmaker@...driver.com,
JBeulich@...e.com, drjones@...hat.com, toshi.kani@...com,
x86@...nel.org, riel@...hat.com, gong.chen@...ux.intel.com
Subject: Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
On Tue, 18 Mar 2014 08:21:19 -0400
Prarit Bhargava <prarit@...hat.com> wrote:
>
>
> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
> > Hang is observed on virtual machines during CPU hotplug,
> > especially in big guests with many CPUs. (It happens more
> > often if host is over-committed).
> >
>
> Hey Igor, I like this better than the previous version. Thanks for taking into
> account the possible races in this code.
>
> A quick question on system behaviour. As you know I've been more concerned
> lately with error handling, etc., through the cpu hotplug code as we've seen
> several customer reports of silent failures or cascading failures in the cpu
> hotplug code when users have been attempting to perform physical hotplug.
>
> After your patches have been applied, in theory the following can happen:
>
> The master CPU is completing the AP cpu's bring up. The AP cpu is doing (sorry
> for the cut-and-paste),
>
> void cpu_init(void)
> {
> int cpu = smp_processor_id();
> struct task_struct *curr = current;
> struct tss_struct *t = &per_cpu(init_tss, cpu);
> struct thread_struct *thread = &curr->thread;
>
> /*
> * wait till the master CPU completes it's STARTUP sequence,
> * and decides to wait till this AP boots
> */
> while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
> cpu_relax();
> if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
> halt();
> }
>
> and is spinning on cpu_relax(). Suppose something goes wrong and the softlockup
> watchdog fires on the AP cpu:
>
> 1. Can it? :) ie) will the softlockup fire at this point of the AP init? Okay,
> I'm being really lazy and not looking at the code ;)
It shouldn't, CPU is in pristine state and just came from boot trampoline at
this point without interrupts configured yet.
>
> 2. Is there anything we can do in this code to notify the user of a problem?
> Even a pr_crit() here I think would help to indicate what went wrong; it might
> be useful for future debugging in this area to have some sort of output. I
> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
Do you mean something like this:
+ if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
+ WARN(1);
+ halt();
+ }
>
> 3. Change this comment:
>
> * wait till the master CPU completes it's STARTUP sequence,
> * and decides to wait till this AP boots
>
> to
>
> /* wait for the master CPU to complete this cpu's STARTUP. */ ?
well, that is not quite the same as above, comment should underline that
AP waits for ACK from master CPU before continuing with this AP initialization.
How about:
/* wait for ACK from master CPU before continuing with AP initialization */
>
> Apologies for the late review,
>
> P.
--
Regards,
Igor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists