linux-kernel - Re: [PATCH] x86: skip delays during SMP initialization similar to Xen

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJvTdKkuaAXtOfgOvYUExgBmO1PgZ0XhsXiEevq=jthY200E8w@mail.gmail.com>
Date:	Sat, 16 May 2015 05:07:59 -0400
From:	Len Brown <lenb@...nel.org>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Jan H. Schönherr <jschoenh@...zon.de>,
	Thomas Gleixner <tglx@...utronix.de>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Anthony Liguori <aliguori@...zon.com>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, Tim Deegan <tim@....org>,
	Gang Wei <gang.wei@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] x86: skip delays during SMP initialization similar to Xen

On Thu, May 14, 2015 at 1:57 PM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * "Jan H. Schönherr" <jschoenh@...zon.de> wrote:
>
>> Ingo, do you want an updated version of the original patch, which
>> takes care not get stuck, when the INIT deassertion is skipped, or
>> do you prefer to address delays "one by one" as you wrote elsewhere?
>
> So I'm not against improving this code at all, but instead of this
> hard to follow mixing of old and new code, I'd find the following
> approach cleaner and more acceptable: create a 'modern' and a 'legacy'
> SMP-bootup variant function, and do a clean separation based on the
> CPU model cutoff condition used by Len's patches:
>
>         /* if modern processor, use no delay */
>         if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && (boot_cpu_data.x86 == 6)) ||
>             ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0xF)))
>                 init_udelay = 0;
>
> Then in the modern variant we can become even more aggressive and
> remove these kinds of delays as well:

Not sure it is worth two versions, since this is not where the big
time is spent.
See below.

>
>                 udelay(300);

FWIW, MPS 1.4 suggests this should be 200, not 300.

>                 udelay(200);
>
> plus I'd suggest making these poll loops in smpboot.c loops narrower:
>
>                         udelay(100);

FWIW, on my dekstop, this one executed 17 times (1700usec)
This is the time for the remote CPU to wake and get to cpu_init().
Why is it a benefit to have any udelay() before invoking schedule()?

>                         udelay(100);

This one didn't execute at all.  Indeed, I don't understand why it exists,
per question above.

                /*
                 * Wait till AP completes initial initialization
                 */
                while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
                        /*
                         * Allow other tasks to run while we wait for the
                         * AP to come online. This also gives a chance
                         * for the MTRR work(triggered by the AP coming online)
                         * to be completed in the stop machine context.
                         */
                        udelay(100);
                        schedule();
                }

So, the latest TIP has the INIT udelay(10,000) removed,
but cpu_up() still takes nearly 19,000 usec on a HSW dekstop.

A quick scan of the ftrace shows some high runners:

18949.45 us cpu_up()
        2450.580 us notifier_call_chain
                 102.751 us thermal_throttle_cpu_callback
                 289.313 us dpm_sysfs_add
                1019.594 us msr_class_cpu_callback
                ...
        8455.462 us native_cpu_up()
                 500.000 us = udelay(300) + udelay(200) Startup IPI
                 500.000 us = udelay(300) + udelay(200) Startup IPI
                1700.000 us = 17 x udelay(100) waiting for AP in initialized_map
                2004.172 us  check_tsc_warp()

        7977.799 us cpu_notify()
                1588.108 us cpuset_cpu_active
                3043.955 us  cacheinfo_cpu_callback
                1146.234 us  mce_cpu_callback
                 541.105 us  cpufreq_cpu_callback
                 213.685 us  coretemp_cpu_callback


cacheinfo_cpu_callback() time appears to be spent creating a bunch
of sysfs nodes, which is apparetly an expensive operation.

check_tsc_warp() is hard-coded to take 2ms.
I don't know if 2ms is a magic number or if shorter has same value.
It seems a bit sad to do this serially for every CPU at boot,
when we could do all the CPUs in parallel after they are on-line.
Perhaps this should be invoked only for boot-time and hot-add time.
It shouldn't be needed at all for soft online and resume.

Startup IPI delays.
MPS 1.4 actually says 200+200, not 300+200, as Linux reads.
I don't know where the 300 came from, maybe it was a typo?

msr_class_cpu_callback -- making device nodes is not fast.

I don't know if anything can be done for the 1700us wait
for the remote processor to mark itself initialized.
That is the 1st thing it does when it enters cpu_init().

On the xeon, I had see x86_init_rdrand() take 781usec --
dunno why that isn't seen on this box.  I'll look at that box again next week.

cheers,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/