lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 May 2015 03:47:16 -0400
From:	Len Brown <lenb@...nel.org>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Jan H. Schönherr <jschoenh@...zon.de>,
	Thomas Gleixner <tglx@...utronix.de>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Anthony Liguori <aliguori@...zon.com>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, Tim Deegan <tim@....org>,
	Gang Wei <gang.wei@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: native_cpu_up speed (Re: [PATCH] x86: skip delays during SMP
 initialization similar to Xen)

>> I don't know if anything can be done for the 1700us wait
>> for the remote processor to mark itself initialized.
>> That is the 1st thing it does when it enters cpu_init().
>
> So that 1.7 msecs delay is the firmware in essence?

Yes -- hardware+microcode+firmware initialization.

I measured this on the 4-socket IVT and found that this
delay is not constant. Here are how many udelay(100)
executed for each processor waiting for "initialized" map:

1:64  (for cpu1, we wait 64 * udelay(100) = 6400 usec)
2:25  (for cpu2, we wait 25 * udelay(100) = 2500 usec)
3:3    (for cpu3, we wait 3 * udelay(100) = 300 usec)
4:3    etc.
5:4
6:4
7:3
8:4
9:3
10:3
11:3
12:3
13:3
14:3
15:20
16:20
17:20
18:20
19:20
20:20
21:20
22:20
23:20
24:20
25:18
26:18
27:18
28:18
29:18
30:20
31:20
32:20
33:20
34:20
35:20
36:20
37:20
38:20
39:20
40:18
41:18
42:18
43:18
44:18
45:20
46:20
47:20
48:20
49:20
50:20
51:20
52:20
53:20
54:20
55:18
56:18
57:18
58:18
59:18
60:0
61:3
62:3
63:3
64:3
65:4
66:4
67:3
68:4
69:3
70:3
71:3
72:3
73:3
74:3
75:20
76:20
77:20
78:20
79:20
80:20
81:20
82:20
83:20
84:20
85:19
86:19
87:19
88:19
89:19
90:20
91:20
92:20
93:20
94:20
95:21
96:20
97:20
98:20
99:20
100:19
101:19
102:19
103:19
104:19
105:20
106:20
107:20
108:20
109:20
110:21
111:20
112:20
113:20
114:20
115:19
116:19
117:18
118:19
119:19

I can't explain this topology, but it gives you an idea of where the time goes.

However, a clear pattern jumped out of the trace for how long
the BSP waits for the AP to set itself in cpu_callin_mask.
This is the time in start secondary where cpu_init() is running,
up through smp_callin() is called.

On the 1st package, each remote AP take 9 delays = 900 us to do this,
whether they are new cores or HT siblings of cores already up.
But the 1st processor on remote _packages_ aka nodes, takes 60,000 us

No typo -- that is 60ms!

subsequent cores in remote nodes take about 1,800 us
whether they are new cores, or HT siblings of cores already up.

cheers,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ