[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJvTdKkuaAXtOfgOvYUExgBmO1PgZ0XhsXiEevq=jthY200E8w@mail.gmail.com>
Date: Sat, 16 May 2015 05:07:59 -0400
From: Len Brown <lenb@...nel.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: Jan H. Schönherr <jschoenh@...zon.de>,
Thomas Gleixner <tglx@...utronix.de>, X86 ML <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Anthony Liguori <aliguori@...zon.com>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Tim Deegan <tim@....org>,
Gang Wei <gang.wei@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] x86: skip delays during SMP initialization similar to Xen
On Thu, May 14, 2015 at 1:57 PM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * "Jan H. Schönherr" <jschoenh@...zon.de> wrote:
>
>> Ingo, do you want an updated version of the original patch, which
>> takes care not get stuck, when the INIT deassertion is skipped, or
>> do you prefer to address delays "one by one" as you wrote elsewhere?
>
> So I'm not against improving this code at all, but instead of this
> hard to follow mixing of old and new code, I'd find the following
> approach cleaner and more acceptable: create a 'modern' and a 'legacy'
> SMP-bootup variant function, and do a clean separation based on the
> CPU model cutoff condition used by Len's patches:
>
> /* if modern processor, use no delay */
> if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && (boot_cpu_data.x86 == 6)) ||
> ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0xF)))
> init_udelay = 0;
>
> Then in the modern variant we can become even more aggressive and
> remove these kinds of delays as well:
Not sure it is worth two versions, since this is not where the big
time is spent.
See below.
>
> udelay(300);
FWIW, MPS 1.4 suggests this should be 200, not 300.
> udelay(200);
>
> plus I'd suggest making these poll loops in smpboot.c loops narrower:
>
> udelay(100);
FWIW, on my dekstop, this one executed 17 times (1700usec)
This is the time for the remote CPU to wake and get to cpu_init().
Why is it a benefit to have any udelay() before invoking schedule()?
> udelay(100);
This one didn't execute at all. Indeed, I don't understand why it exists,
per question above.
/*
* Wait till AP completes initial initialization
*/
while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
/*
* Allow other tasks to run while we wait for the
* AP to come online. This also gives a chance
* for the MTRR work(triggered by the AP coming online)
* to be completed in the stop machine context.
*/
udelay(100);
schedule();
}
So, the latest TIP has the INIT udelay(10,000) removed,
but cpu_up() still takes nearly 19,000 usec on a HSW dekstop.
A quick scan of the ftrace shows some high runners:
18949.45 us cpu_up()
2450.580 us notifier_call_chain
102.751 us thermal_throttle_cpu_callback
289.313 us dpm_sysfs_add
1019.594 us msr_class_cpu_callback
...
8455.462 us native_cpu_up()
500.000 us = udelay(300) + udelay(200) Startup IPI
500.000 us = udelay(300) + udelay(200) Startup IPI
1700.000 us = 17 x udelay(100) waiting for AP in initialized_map
2004.172 us check_tsc_warp()
7977.799 us cpu_notify()
1588.108 us cpuset_cpu_active
3043.955 us cacheinfo_cpu_callback
1146.234 us mce_cpu_callback
541.105 us cpufreq_cpu_callback
213.685 us coretemp_cpu_callback
cacheinfo_cpu_callback() time appears to be spent creating a bunch
of sysfs nodes, which is apparetly an expensive operation.
check_tsc_warp() is hard-coded to take 2ms.
I don't know if 2ms is a magic number or if shorter has same value.
It seems a bit sad to do this serially for every CPU at boot,
when we could do all the CPUs in parallel after they are on-line.
Perhaps this should be invoked only for boot-time and hot-add time.
It shouldn't be needed at all for soft online and resume.
Startup IPI delays.
MPS 1.4 actually says 200+200, not 300+200, as Linux reads.
I don't know where the 300 came from, maybe it was a typo?
msr_class_cpu_callback -- making device nodes is not fast.
I don't know if anything can be done for the 1700us wait
for the remote processor to mark itself initialized.
That is the 1st thing it does when it enters cpu_init().
On the xeon, I had see x86_init_rdrand() take 781usec --
dunno why that isn't seen on this box. I'll look at that box again next week.
cheers,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists